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Abstract 


Trading is a key function in the context of dis- 
tributed applications: It allows runtime discover- 
ing of available resources. In order to standard- 
ize this function, the Open Distributed Processing 
(ODP) and Object Management Group (OMG) have 
specified a trading service for CORBA objects: The 
CosTrading. This specification has two main draw- 
backs: First, this service is complex to use from ap- 
plications and second, it does not offer type check- 
ing of trading requests at compilation time. Both are 
discussed in this paper. The main goal of our Trader 
Oriented Request Broker Architecture (TORBA) is 
to provide a trading framework and its associated 
tools, which tend to offer typed trading operations 
that are simple to use from applications and checked 
at compilation time. In that, we define the concept 
of Trading Contracts, written with the TORBA 
Definition Language (TDL). Such contracts are then 
compiled to generate trading proxies offering simple- 
to-use interfaces. These interfaces completely hide 
the complexity of the ODP/OMG CosTrading APIs. 
In the meantime, TDL contracts could be dynami- 
cally used through a generic graphical console ez- 
ploiting a contract repository. The example used in 
this paper, clearly states the advantages brought by 
the TDL trading contracts: type checking at compi- 
lation time, simple to use, and providing a powerful 
framework for CORBA object trading. 


1 Introduction 


Nowadays, building, deploying, and running dis- 
tributed applications rely on a set of ser- 


vices/functions offered by standard middleware like 
the Common Object Request Broker Architecture 
[19] (CORBA) of the Object Management Group 
(OMG), the Distributed Component Object Model 
[8] (DCOM) of Microsoft, and more recently the 
Java Remote Method Invocation [24] (RMI) of Sun 
Microsystems. The main functions of such mid- 
dleware solutions are synchronous communication 
using operation invocation, asynchronous commu- 
nication through message or event passing, trans- 
action monitors, security, persistence, and resource 
trading. This paper proposes an innovative frame- 
work named Trader Oriented Request Broker Archi- 
tecture (TORBA) to trade distributed objects over 
CORBA. 


A middleware trading function tends to provide 
a means to discover resources available in a dis- 
tributed system, in order to dynamically intercon- 
nect at runtime the various components of an appli- 
cation. For example, it allows a client to find back 
its associated server. Such a search may be based 
upon various criteria, like the physical location of 
the resource (e.g. to find the printer service of the 
third floor), the symbolic name of the resource (e.g. 
to find the BestPrint printer), or the characteriza- 
tion of the resource using its properties (e.g. find a 
color printer faster than ten pages per minute). The 
conceptual contribution of this paper is to define the 
concept of trading contract in order to characterize 
both the resource properties, and the search opera- 
tions used by client applications. 


The trading function has been studied both in 
academic projects and industrial products. Some 
projects have focussed on the interest of using such 
a function in a large scale context in order to share 
resources [16]. In 1993, the ANSA consortium has 
discussed what the trading function should be [7]. 
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More recently, Sun Microsystems has defined a trad- 
ing function, included in the Jini environment [1]. 
Based on the easiness with Java to serialize objects, 
this trading function allows applications to retrieve 
serialized objects (like network stubs or complete 
services). Other research works have focussed on 
traders federations [5], performance [6], and scala- 
bility [26]. In the context of object trading, the first 
technical contribution of this paper is an innovative 
approach that simplifies and types the use of a trad- 
ing function. 


In order to standardize middleware trading func- 
tion, the International Standardization Organiza- 
tion (ISO) in its Open Distributed Processing [11] 
(ODP) activity and the Object Management Group 
(OMG) have defined a specification of the functional 
interfaces of such a function [17] using the OMG In- 
terface Definition Language (OMG IDL). This spec- 
ification is mainly based on the work previously per- 
formed by the ANSA consortium [7] and the DSTC 
[28]. It defines a set of generic APIs for applications 
to export and search CORBA object references in a 
standard and portable way, whatever the underly- 
ing implementation. Unfortunately, these APIs are 
quite complex to use and very technical. Moreover, 
using these APIs does not provide trading request 
type checking at compilation time, but only at run- 
time. Thus, the second technical contribution of 
this paper is to perform trading request type check- 
ing at compilation time, improving software quality 
and reliability. 


The objective of our work is to define and to of- 
fer a typed trading environment being easy to use 
from CORBA applications. In that, we have de- 
fined the trading contract concept used to describe 
typed properties (object characterizations) as well 
as query operations to be used by applications. The 
TORBA Definition Language (TDL) is used to de- 
fine these contracts. Then, it is compiled to gener- 
ate trading proxies offering simple specialized inter- 
faces to be used from client applications. The use 
of these interfaces is checked at compilation time, 
based on their types (i.e. operation synopsis). Fur- 
thermore, these proxy implementations completely 
hide the technical complexity of the ODP/OMG 
trader interfaces. In the meantime, TDL contracts 
could be stored in a TDL repository, like OMG IDL 
definitions are stored in CORBA’s Interface Repos- 
itory. Then, this repository could be used dynami- 
cally from a graphical console to discover available 
trading offers and to use defined query operations. 
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Section 2 of this paper presents an overview of the 
ODP/OMG CosTrading service. This overview fo- 
cuses on trading offer typing and use of the query 
operation, in order to outline their drawbacks: tech- 
nical complexity and lack of type checking. Sec- 
tion 3 discusses the trading contract concept, the 
TDL language, the proxy generation and execution 
process, as well as the dynamic approach. It also 
presents the implementation of TORBA, using a 
printer service as example to underline the bene- 
fits of our approach. Since TORBA use has only 
been performed using simple examples, section 4 
presents some empirical results. Section 5 discusses 
the related work in middleware that are used in 
TORBA: the proxy concept, the structure of ORBs, 
and the component-oriented approach. Finally, sec- 
tion 6 summarizes this paper, in progress, as well as 
fore-coming work directions. 


2 The CosTrading Service 


2.1 Overview 


The ODP/OMG CosTrading service is similar to 
a search engine for CORBA object service refer- 
ences. Figure 1 presents the CosTrading standard 
use, composed of four steps. (1) Service designers 
define their service offer types (see section 2.2). (2) 
Service providers or application servers character- 
ize and export their service offers using properties 
describing the service. (3) Service users or client 
applications search service references using criteria 
describing their requirements. (4) Once references 
have been retrieved, clients invoke operations on 
the services. All these requests—definition, export, 
lookup, and use—are carried by CORBA. 









A Offer Type L. Definition 
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Figure 1: The ODP/OMG CosTrading Service Use. 


The CosTrading service provides three main in- 
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terfaces for applications. The ServiceType- 
Repository interface is used to define and man- 
age service offer types. The Register interface is 
used to export service offers. Finally, the Lookup 
interface is used to search exported service offers. 
Other interfaces are also available for administra- 
tion purposes, like to set the behavior of the trader 
and its search operation, as well as to build trader 
federations—offering a potentially large-scaled and 
unified trading service [2]. 


2.2 Service Offer Types 


In the CosTrading, service offers are strongly typed. 
Any export operation is based on the use of an of- 
fer type, similarly lookup operations are performed 
upon a given type. Typed offers bring two main 
advantages. First, applications cannot export or 
search weird offers, but only offers defined at design 
time. Second, a CosTrading implementation may 
take advantage of types to improve performance, 
only searching in offers of a given type instead of 
in all the offers. This becomes vital when several 
thousand of offers have been exported. Part of the 
CosTrading service, the Service Type Reposi- 
tory stores the various service offer types. It also 
provides type checking when exporting or search- 
ing offers. In this repository, a service offer type is 
characterized using four elements: 


e a name, which is a unique global identifier in 
a trader federation and used to define, export, 
and search service offers, 


e some inherited super types, used with par- 
ticular rules for redefinition which are not dis- 
cussed here, 


e an OMG IDL interface to which exported 
service references have to conform, and 


e some service properties characterizing the 
exported service. 


Each service property is characterized using: 


e a name, which is also a unique identifier in a 
service type, 


e an OMG IDL type which characterizes the 
type of the property values, and 


e a using mode, which has to be set to: 


— normal: Giving a value tosuch a property 
is optional at creation time. If a value 
is given, it could be modified or removed 
during execution by the service provider. 


— readonly: It is not required to give a value 
to such a property, but if so the value can- 
not be modified. 


— mandatory: The service provider has to 
give a value to such a property at expor- 
tation time. 


- mandatory readonly: The provider has to 
give a value to the property, which cannot 
change during the execution. 


Figure 2 presents the OMG IDL interface of a 
printer service. This service is used in this paper 


to illustrate various aspects of the trading service 
and our TORBA proposal. 


interface PrinterServer { 
void print (in string filename) ; 


}; 


Figure 2: OMG IDL of a Simple Printer Service. 


The related service offer type is Printer, which is 
characterized by the four properties presented in Ta- 
ble 1. The color property specifies if a printer could 
print in color or only in B&W. The cost_per_page 
property contains the cost to print a sheet of paper 
for this printer. —The number of pages per minute 
a printer can produce is contained in the ppm prop- 
erty. Finally, the queue property is the name of the 
printer queue. 


normal 
normal 
normal 
normal 


boolean 

float 

unsigned short 
string 


color 


cost_per_page 


ppm 
queue 





Table 1: Printer Service Offer Properties. 


As stated earlier, it is important to have typed 
offers. However, dealing with software qual- 
ity, the CosTrading service lacks a standard lan- 
guage to describe offer types. The only available 
means is to use the add.type() operation of the 
ServiceTypeRepository interface provided by the 
CosTrading service. Section 3.2 discusses how the 
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module CosTrading { 
interface Lookup : TraderComponents, 
SupportAttributes, 
ImportAttributes { 
void query ( 


in ServiceTypeName type, 

in Constraint constr, 

in Preference pref, 

in PolicySeq policies, 

in SpecifiedProps desired_props, 
in unsigned long how_many , 

out OfferSeq offers, 


out OfferIterator 
out PolicyNameSeq 

) raises ( 
IllegalServiceType, 
UnknownServiceType, 
IllegalConstraint, 
IllegalPreference, 
IllegalPolicyName, 
PolicyTypeMismatch, 
InvalidPolicyValue, 
IllegalPropertyName, 
DuplicatePropertyName, 
DuplicatePolicyName 

); 

sy 
33 


offer_itr, 
limits_applied 





Figure 3: The Lookup Interface to Search Offers. 


TORBA Definition Language (TDL) addresses this 
problem. 


2.3. Searching Service Offers 


As this paper focuses on the search process, the 
drawbacks of the export process are not discussed 
here. However, these drawbacks are similar to those 
presented in this section. 


Once offers have been exported by servers, their 
references and properties could be retrieved using 
the Cos Trading search operation. Figure 3 presents 
its Lookup interface used to perform searches. The 
query operation allows clients to find back services 
from the set of exported offers. The argument num- 
ber of this operation is quite high. This is due to 
the genericity required by the operation in order to 
be usable in a wide number of applications. 


The type parameter defines the offer type required 
by the client application. The constr parameter 
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contains a constraint to be matched by the prop- 
erties of selected offers. This constraint is a string 
containing a boolean expression written using the 
OMG Constraint Language (OCL). The pref pa- 
rameter specifies the returned offer order in OCL. 
The policies parameter specifies the strategies to 
be used during the search. The desired_props ar- 
gument contains the properties to be returned for 
each offer to the client: none, all, or only specific 
ones. As there may be a huge number of matching 
offers, the how_many argument fixes the maximum 
number of offers to be returned. Following offers 
could be retrieved later on using the offer_itr it- 
erator provided by the operation. Finally, the two 
last out parameters contain, after processing, the 
result (offers) and the limits effectively applied to 
the search policy (limits.applied). 


Furthermore, when providing wrong parameter val- 
ues, the query operation raises one of the ten listed 
exceptions. Such exceptions mean a misuse of the 
Cos Trading service related to search strategies or 
to its type model, like an illegal or unknown type 
name, a badly expressed preference or constraint, or 
an unknown property name. Thus, the CosTrading 
service only checks requests at runtime, while type 
checking could be performed at application compila- 
tion time, improving both software quality and ser- 
vice performance—avoiding runtime type checking. 
At the moment, TORBA, as discussed in the follow- 
ing, mainly addresses type checking at application 
compilation time. 


Figure 4 presents how an application, written in 
OMG IDLscript! [21], may retrieve offers about 
color printers faster than two pages per minute. The 
offers, iter et limits variables are initialized to 
receive the query() operation results. The offer 
type, property constraints, and the result order are 
provided as strings. Thus, it is up to the Cos Trad- 
ing server to check and to evaluate these strings in 
order to perform the search, implying type errors to 
be only discovered at runtime. 


The simplicity of this excerpt relies on the use of the 
OMG IDLscript language. However, a real applica- 
tion has to set the search strategy, catch and process 
the potential exceptions, and process the returned 
results. The latter includes the offers sequence 
processing, and potentially the use of the iter it- 
erator to process the following offers. Thus, about 
fifty lines of Java or C++ are required only to ob- 


1IDLscript is the CORBA 3.0 scripting language, contri- 
bution and specification of our project CorbaScript (4, 13, 14]. 
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# variables to receive answers 
# using the out mode 
offers = Holder() # returned offers 


iter = Holder() # next offers iterator 
limits = Holder() # limits applied 
trader.query ( 

"Printer", # offer type required 


# OCL for offer constraint 

"color == TRUE and ppm > 2", 

"first", # answer order 

# use default strategy 

CosTrading. PolicySeq(), 

# properties to be returned 

CosTrading.Lookup.SpecifiedProps ( 
CosTrading.Lookup.HowManyProps.some, 
["queue", "color", 
"cost_per_page", "ppm"] 

), 

# max number and out params 

100, offers, iter, limits 


Figure 4: Searching offers using IDLscript. 


tain the list of color printers faster than two pages 
per minute. In that, we claim that the query oper- 
ation is complex and very technical to use. More- 
over, the huge use of this operation forces appli- 
cations to build, invoke, and process many trading 
requests, introducing code complexity and potential 
runtime errors. Section 3.3 discusses hw TORBA 
automates this trading related technical code pro- 
duction to simplify application code. 


2.4 Review 


More than presenting the main operations provided 
by the ODP/OMG CosTrading service to type and 
search offers, this section outline the drawbacks 
of these functions. First, the CosTrading service 
does not provide a definition language to define of- 
fer types. Such a language is mentioned in the 
CosTrading specification, however only for an illus- 
trative purpose. The service only relies on the use 
of a type repository used at runtime. Then, the 
technicity and complexity of this service have been 
discussed. In order to benefit from the CosTrading 
service, it is necessary to master the use of oper- 
ations like query and data structures provided by 
the service. Finally, using strings to manipulate 
propertiesimplies runtime t ype checking and forbids 
type checking at compilation time. This reduces the 
easiness to produce reliable applications in an effi- 


cient way. To summarize, as any CORBA service, 
the CosTrading service only offers a set of complete 
OMG IDL interfaces. This brings the following four 
questions. 


How to simplify the use of the CosTrading ser- 
vice? 


e How to provide type checking at compilation 
time? 


e Which language should be used in order to de- 
fine offer types? 


Which framework should be applied to trading? 


Looking at today’s software industry, three answers 
arise. First, a GUI could be provided to use the 
trading service easily. This solution is already avail- 
able for many trading service products. Neverthe 
less, this choice does not address the use of the trad- 
ing service from an application. Then, a library may 
hide the trading service complexity. However, pro- 
viding such a library is a huge task: It would be 
easy to suffer the same drawbacks as the CosTrad- 
ing interfaces. Moreover, it would only define a pro- 
gramming framework, but no design method, nor 
language to specify offer types. Finally, a trading 
function, to be specialized to each application needs, 
could be defined using the concept of trading con- 
tract as discussed in the following section. 


3 The TORBA Proposal 


3.1 The Trading Contract Approach 


The objective of TORBA is to provide a simple 
and strongly typed trading facility for CORBA ap- 
plications. In that, TORBA is based upon the 
ODP/OMG CosTrading service, taking full advan- 
tage of its functionalities like available im plementa- 
tions, complex lookup algorithms, offer persistence, 
large-scaled trader federations. 


Then, the conceptual benefit of TORBA is to define 
the concept of trading contracts. Such a contract is 
defined at application design time like OMG IDL 
interface contracts are defined [9]. These contracts 
take into account offer provider needs as well as 
client application ones: This results in the definition 


a 


6th USENIX Conference on Object-Oriented Technologies and Systems 


of trading offer types. First, offer types clearly iden- 
tify and group together properties (i.e. name and 
type of the values) characterizing exported CORBA 
objects conform to a given OMG IDL interface. Sec- 
ond, offer types also contain a list of query oper- 
ations commonly used in client applications. Such 
operation is characterized through a synopsis (name 
and parameters), as well as a boolean constraint to 
be applied on both parameters and the properties 
of the associated type. Offer types may be classi- 
fied using multiple inheritance. Such a classification 
permits designers to define abstract types, like a de- 
vice, that could be specialized to concrete types, like 
a scanner and a printer. Moreover, concrete types 
can also be inherited to define new query operations 
exactly meeting requirements of client applications. 
Using multiple inheritance improves the reuseness 
of properties and query operations. 


The technical benefit of TORBA is to provide a 
complete generation and execution environment to 
use trading contracts. Offer types are defined us- 
ing the TORBA Definition Language (TDL). Such 
definitions are then compiled to generate trading 
proxies offering to applications easy-to-use OMG 
IDL interfaces. The use of these specialized in- 
terfaces is thus checked at application compilation 
time. Moreover, proxy implementations fully hide 
the ODP/OMG CosTrading technicity. Such im- 
plementation is generated for several programming 
languages: OMG IDLscript, Java, and C++ later 
on. In the meantime, trading contracts could also be 
used dynamically through a generic graphical con- 
sole. Trading contracts are stored into a repository 
and browsed by the console which can also invoke 
query operations defined for the given type. 


3.2. The TORBA Definition Language 


The TORBA Definition Language (TDL) is the for- 
malism to define TORBA trading contracts. Using 
simple typed constructions, it describes offer types, 
their inheritance relation, their properties (name 
and type), as well as query operations (name, pa- 
rameters, and constraints). Property and parame- 
ter types rely upon the OMG IDL type model. Con- 
straints are defined using the OMG Constraint Lan- 
guage (OCL) extended to take into account query 
operation parameters as well as to offer composition 
of query operations. TDL is defined as two lan- 
guages: an XML DTD and a BNF grammar. This 
paper only describes the second one, being more 


6th USENIX Conference on Object-Oriented Technologies and Systems 


abstract offer Device { 
property string name ; 
query all () is TRUE ; 

offer Printer : Device { 
interface PrintService ; 
property boolean color ; 
property float cost_per_page ; 
property unsigned short ppm ; 


query colors () is color == TRUE ; 

query faster (in unsigned short s) 
is ppm>s ; 

query faster_colors (in unsigned short s) 
is colors () and faster (s) ; 


}; 


Figure 5: Trading Offer Type Definition using TDL. 


concise and quite familiar to CORBA users. Fig- 
ure 5 presents an example of offer type definitions. 


A trading offer type is defined using the offer key- 
word followed by the type name, and possibly the 
list of inherited super-types. Basically, a type is 
concrete: provider could export offers using this 
type. Then, it has to include an interface en- 
try defining the base interface to be supported by 
exported objects. The abstract keyword defines a 
type as being abstract, no offer may be exported 
for this type. It will be inherited to define concrete 
types. The TDL contract of Figure 5 defines two 
offer types related to the printer example of this pa- 
per: The Printer concrete type inherits from the 
Device abstract type, and specifies offers for objects 
implementing the PrintService interface (or one of 
its sub-interfaces). Properties are defined using the 
property keyword followed by an optional access 
mode, an OMG IDL type, and a formal name. If 
undefined the access mode is normal (see section 
2.2). 


The Printer offer includes the four following prop- 
erties: the name string inherited from Device, the 
color boolean, the cost_per.page float, and the 
ppm unsigned short. Search operations are defined 
using the query keyword followed by a name, po- 
tentially a list of arguments (defined as for OMG 
IDL operations), and a constraint. The constraint 
is based upon the properties of the offer type (e.g. 
the colors() query), the properties and the param- 
eters (e.g. the faster () query), or a composition of 
query operations (e.g. the faster_colors() query). 
The al1() query is defined with TRUE as constraint 
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in order to retrieve all the available offers for the 
Printer type. Query operation inheritance has the 
following semantic: The constraint is kept, how- 
ever it does not apply on the super-type, but on 
the inherited type. The operation implementation 
is implicitly overloaded in generated proxies. When 
applied to the Device type, the al1() operation re- 
turns all the available device offers. When applied 
to the Printer type, it only returns the available 
printer offers. 


Defining specific queries for given values of proper- 
ties should not be misused. The point is not to de- 
fine a query for any potential property value, but 
to define the most commonly used queries. For 
queries that may appear from time to time only, the 
generic query operation available with all types has 
to be used (see section 3.3.1). Nevertheless, using 
the generic query() generated by TORBA already 
brings type checking and reduces the technicity of 
the lookup mechanism. 


Two constraints are implied by the use of TDL con- 
tracts. First, like OMG IDL contracts, TDL con- 
tracts have to be globally known to clients. More- 
over, the type hierarchy of TDL may be extended 
but has to stay consistent, ie. no TDL contract 
should be removed nor modified. Second, each TDL 
contract has to be defined using an identifier being 
unique in the whole system. Here too, like OMG 
IDL definition it is important for designers to de- 
fine their TDL contracts using modules in order to 
avoid name collisions. 


This section has presented the second TDL formal- 
ism (BNF grammar), being simple to learn. This 
basis will be extended according to the need aris- 
ing from our experiments. As an example, dynamic 
properties specification, whose values are computed 
at runtime and not statically set at exportation 
time, seems an interesting extension. However, it 
is important for this language not to become too 
complex and underused due to this complexity. 


3.3. Trading Proxy Generation 


Once trading contracts have been defined using 
TDL, they may be compiled to generate trading 
proxies for applications, as depicted in Figure 6. 


The TDL compiler checks both the syntax and 
the semantic of TDL definitions. Semantic check- 


OMG 
c \ ye eS 
* 
TDL | ———> TDL 
Compiler 


Figure 6: TDL Language Compilation Process. 
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ing controls OMG IDL type correctness, TDL type 
names and properties, as well as OCL constraints 
in order to ensure no type related problem could 
arise at runtime. Proxy OMG IDL interfaces are 
produced by the TDL compiler, as well as their im- 
plementation for a given language—IDLscript and 
Java for the moment, C++ later on. For porta- 
bility purpose, the TDL compiler is written in the 
Java language, based on the lexical and syntactic 
analyzer generated using JavaCC [15]. 


3.3.1 Generated OMG IDL Interfaces 


Each definition of trading offer type is mapped to 
an OMG IDL module named as the offer type and 
containing the five following definitions. 


e The Offer structure represents a trading offer. 
It contains the exported object reference and a 
field for each property defined in the offer type 
or its super-types. Field types are those of the 
service interface and property types as defined 
in the TDL offer. 


The OfferSeq sequence is used by query oper- 
ations to return matching offers. 


OfferType, Export, and Lookup interfaces re- 
spectively describe the Service Ty pe Reposi- 
tory access, the export and the lookup proxies. 
The latter inherits from the TORBA: : Lookup 
interface and contains an operation for each 
query definition. Its also contains a generic but 
nonetheless typed query operation. 


Figure 7 presents an excerpt (the lookup proxy in- 
terface) of the OMG IDL definitions generated for 
the Printer trading contract as defined in Figure 5. 





6th USENIX Conference on Object-Oriented Technologies and Systems 


#include <TORBA.idl> 
module Printer { 
struct Offer { 


PrintService service ; 
string name ; 
boolean color ; 
float cost_per_page ; 
unsigned short ppm ; 

}; 


typedef sequence<Offer> OfferSeq ; 


interface Lookup : TORBA::Lookup { 
OfferSeq query_all () ; 
OfferSeq query_colors () ; 
OfferSeq query_faster 
(in unsigned short s) ; 
OfferSeq query_faster_colors 
(in unsigned short s) ; 
OfferSeq query (in TORBA: :Query q) 
raises (TORBA::IllegalConstraint) ; 
}; 
// interfaces for type definition 
// and exportation 


BS 


Figure 7: OMG IDL Module Generated from the 
Printer TDL Contract (excerpt) 


The Printer offer type is mapped to the Printer 
OMG IDL module. The Offer structure rep- 
resents a printer offer. It contains a field for 
the exported print service, as well as for the 
name, color, cost_per_page, and ppm properties. 
The lookup proxy query..a11(), query-colors(), 
query..faster(), and query.faster_colors() op 
erations represent the queries defined in the 
Printer contract. Parameters are the same as 
those defined in the contract, while their return 
type is a printer offer sequence (i.e. OfferSeq). 
The last query( operation allows applications to 
perform searches not defined in the TDL contract. 
The TORBA: :1llegalConstraint exception may be 
raised at runtime if the constraint is malformed. 


Experiments have been performed using generation 
rules presented here, validating these choices. As 
an example, the Offer structure is a good means 
to perform checking of export and lookup opera- 
tions at compilation time. However, we also intend 
to experiment the use of valuety pes? instead of the 
structure, as well as using a typed iterator interface 
instead of the sequence. 


*Since CORBA 2.3, valuetypes permit argument objects 
to be passed by value instead of by reference. 
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# File ‘PrinterProxies.cs’ 
import TORBArt 
class Lookup (TORBArt.LookupBase) { 
proc __Lookup__ (self) { 
self.__LookupBase__ ("Printer", 
Printer .Lookup) 
} 
proc query (self, constraint) { 
answers = [] 
for offer in 
self.generic_query (constraint) { 
answers.append ( 

Printer .Offer ( 
offer["service"], 
offer["name"], offer["color"], 
offer ["cost_per_page"], 
offer ["ppm"] 

i) 
} 
return answers 
} 
proc query_faster (self, s) { 
return self.query ("ppm >= " + 
s._toString()) 
} 
proc query_all (self) { 
return self.query ("TRUE") 
} 


# other query operations 


Figure 8: OMG IDLscript implementation of the 
Printer lookup proxy (excerpt) 


3.3.2 Generated Proxy Implementation 


The generation of proxy implementations depends 
on the constructions of a given language. However, 
using an object-oriented language, each OMG IDL 
interface is implemented by a class inheriting from a 
base class provided by the TORBA runtime. These 
classes fully hide the ODP/OMG CosTrading tech- 
nicity: use of the service interfaces and data struc- 
tures, as well as exception handling. Such runtime 
classes provide generic operations used from proxy 
implementations. Figure 8 presents an excerpt of 
the lookup proxy implementation for the Printer 
offer type, generated for the OMG IDLscript’lan- 
guage. 

The  Lookup- class inherits from _ the 
TORBArt.LookupBase class provided by the 
TORBA runtime. The ~Lookup_ constructor 
invokes the super-class constructor providing the 
TDL type name (i.e. Printer), as well as the im- 
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lookup = PrinterProxies .Lookup() 

offersl = lookup.query_faster_colors (2) 

offers2 = lookup.query ("color == FALSE 
and cost_per_page < 0.05 and ppm > 10") 


Figure 9: Printer Search Proxy Use. 


plemented interface type (i.e. Printer: :Lookup). 
The generic query operation is invoked by the query 
operation providing the constraint to apply. Then, 
the result is translated to a printer offer sequence 
(ie. Printer::OfferSeq). The implementa- 
tion of query operations, like query_faster and 
query-all, only consists of creating the associated 
constraint and invoking the query operation. 


3.4 Using TORBA Proxies 


Figure 9 presents, in OMG IDLscript, the use of 
lookup proxy presented in the previous section. The 
first line instantiates the lookup proxy class. The 
second line invokes the query operation to find color 
printers faster than two pages per minute. This op- 
eration realizes the same search processing as the 
one presented in Figure 4. Simplicity brought up 
by TORBA becomes clear. The application de- 
veloper does not bother with the trader technicity, 
he/she can focus on the use of the trading contract 
only. Moreover, the operation execution cannot fail 
as types have been checked by the TDL compiler. 
It can only return an empty sequence if no offer 
matches the search. The third line illustrates the 
option of using a search operation not defined in the 
trading contract: Searching offers related to B&W 
printers faster than ten pages per minute, for a cost 
less than five cents a page. Nevertheless, even if the 
use of this operation is provided, software engineer- 
ing quality is improved when all the search requests 
are defined in the trading contract. 


3.5 Execution of TORBA Proxies 


Figure 10 presents the set of objects involved dur- 
ing the execution of a query operation. The 
lookup proxy object and its CORBA stub are co- 
located with the application. This latter invokes 
the proxy operations through its OMG IDL inter- 
face. The proxy operation implementation invokes 
the TORBA runtime class providing the appropri- 


ate constraint. Then, this class invokes the CORBA 
stub providing access to the ODP/OMG CosTrad- 
ing service. As a result, the runtime class catches 
the exceptions and the proxy class translates data 
from its CosTrading representation to the represen- 
tation defined in the trading contract. One future 
work is to measure the overhead of lookup proxies 
and to optimize their implementations in order to be 
close to native CosTrading performance (see section 
4). 


Client 


Application 





y Lookup Proxy 








Lookup 
OMG IDL 


Lookup 
Class 


ODP / OMG 
CosTrading 









CosTrading 


Service 





Figure 10: Execution Process of Lookup Proxy Op- 
erations. 


3.6 The TORBA Dynamic Approach 


Previous sections have presented the conceptual 
benefits of TDL contracts, as well as the technical 
ones brought by generation and execution of related 
proxies. In the meantime, the TORBA environment 
offers a dynamic approach to use trading contracts 
as depicted in Figure 11. This approach permits 
one to build applications without static knowledge, 
at design time, about used trading contracts. This 
knowledge will be learnt at runtime. 


The dynamic approach in TORBA relies on a trad- 
ing contract repository. This repository, currently 
written in OMG IDLscript, stores TDL contracts 
as a graph of CORBA objects. Each object of the 
graph represents at runtime a semantic construc- 
tion of the TDL language. Thus, offer, property, 
and query constructions are mapped to OfferDef, 
PropertyDef, and QueryDef interfaces defined in 
the TORBA module. These interfaces provide oper- 
ations to create and browse objects of the graph, 
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Figure 11: The TORBA Dynamic Approach. 


providing TDL information at runtime. Creation 
operations are used by a specific version of the TDL 
compiler in order to feed the repository. Other oper- 
ations are used by any TORBA application requir- 
ing dynamic discovering of available trading con- 
tracts. In order to validate this approach, we have 
realized in JavalDLscript? a first dynamic applica- 
tion: the TORBA explorer, illustrated in Figure 12. 


Through a GUI written using Java Swing, the 
TORBA explorer allows users to browse available 
trading contracts, to select a contract, to consult 
associated offers, and to perform predefined or spe- 
cific query operations. The explorer implementation 
does not rely on any trading contract: Graphical 
interfaces are dynamically built at runtime accord- 
ing to trading contracts discovered into the TORBA 
repository. Thus, the TORBA explorer provides 
a trading GUI dedicated to the contracts used by 
applications, unlike GUI included with CosTrading 
implementations. Finally, this explorer is a generic 
and graphical proof of the relevance and strength of 
the trading contract concept presented in this pa- 
per. 


4 Empirical Results 


TORBA introduces extra processing while sending 
requests to the CosTrading. This implies an over- 
head. However, in the context of distributed ap- 
plications, there are two levels in the evaluation of 
overhead. First, there are remote method invoca- 
tions which overhead is potentially high. Second, 
there are local method invocations which overhead 
is most of the time insignificant compared to re- 
mote method one. In the context of TORBA, the 


3 JavalDLscript is our second implementation of the OMG 
IDLscript language offering access to CORBA and Java ob- 
jects in the meantime. 


overhead is introduced by the use of a local library, 
which means local invocations only : Three local in- 
vocations are added for each trading request. Thus, 
it just increases local processing time and keeps 
the number of remote method invocations identical, 
compared to the standard use of the CosTrading. 
Then, based on early test performed using the OR- 
Bacus Trader, the overhead introduced by TORBA 
is, without any ORB specific optimizations in pro- 
ducing TORBA proxies, less than 5%. 


There are few ways to improve performance related 
to trading using TORBA. First, if the only use 
of the trader is performed through TORBA, then 
most of the trader checks (like type checking) can 
be removed since already performed by the proxies. 
Thus, the overhead brought up by the proxy would 
be balanced. Second, proxies could be located close 
to the trading server and not close to the client. 
Then, the number of network requests could be op- 
timized, improving global performance. Third, us- 
ing smart prozies, local to the client, to perform 
caching of trading results, the global performance 
could be optimized. Finally, a composition of the 
three aspects will bring the best results. Points two 
and three are not incompatible as proxies would be 
split between the trading server and the client ap- 
plication. Client side proxies would perform caching 
while server side ones would be dedicated to type 
checking and network optimization. 


5 Comparison and Source of Inspira- 
tion 


During the 80’s, the ODP community has defined a 
type management system for the ODP trading func- 
tion [10]. This research result has been partly inte- 
grated in the ODP / OMG CosTrading specifica- 
tion. However, to our knowledge, no similar works 
to our TORBA proposal have been performed to 
reduce the CosTrading complexity and to increase 
reliability of trading based applications. The trad- 
ing aspects of ODL (Object Definition Language) 
defined by the TINA consortium [22] were only re- 
lated to defining trading properties of objects. This 
could only be seen as basic trading contracts : at- 
tributes are inevitably strings and ODL do not per- 
mit to define typed queries. Thus, the complexity 
of the trader’s use is not reduced actually. In the 
meantime, our original work relies on the use of well- 
known mechanisms of distributed object computing 
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Figure 12: The TORBA Explorer. 


middleware: the proxy principle, the ORB struc- 
ture, and the component approach. 


The proxy principle has been defined in [23] as a 
structural concept to build distributed applications, 
acting on the behalf of a remote object. This prin- 
ciple extends the RPC (Remote Procedure Call) 
mechanism as defined in [3] in order to use it in an 
object-oriented context (i.e. Remote Method Invo- 
cation). At the communication level, a proxy (a.k.a. 
stub) serializes invocations to remote objects like in 
CORBA [19], DCOM [8], and Java RMI [24] envi- 
ronments. Such a proxy implementation fully hides 
the technicity related to the serialization process: 
marshalling of parameters into a network message, 
care taking of heterogeneity, network layer and er- 
ror management, and finally unmarshalling the net- 
work reply to application data. These proxies are 
generated based on communication contracts writ- 
ten using an interface definition language (IDL). 
These IDL descriptions simplify and bring automa- 
tion to produce the implementation of communica- 
tion means, increasing the reliability of applications. 
In the context of TORBA, the communication con- 
tract concept, the IDL language, and communica- 
tion proxies are transposed to trading contracts, the 
TDL language, and trading proxies. Thus, TDL de- 
scriptions simplify and bring automation to produce 


code related to trading, increasing application reli- 
ability. 


Compared to smart proxies used in the Quality 
of Objects (QuO) middleware [30], or as imple- 
mentation of meta-programming mechanism [29], 
TORBA proxies cannot be labeled as smart. In 
these two examples, smart proxies are proxies that 
potentially perform more processing—like logging, 
caching, QoS control, or meta-programming—in a 
transparent way from the client point of view. Ex- 
tra processing is added to the standard one, with- 
out modifying the proxy interface. In the context of 
TORBA, proxies have an explicit interface which is 
different from the classical interface of the Cos Trad- 
ing. Moreover, smart prozies tend to offer dynamic 
mechanism for reconfiguration while TORBA prox- 
ies are quite static, and could not be changed dy- 
namically at runtime. 


TORBA is close to CORBA. The OMG IDL lan- 
guage permits designers to describe interface con- 
tracts for CORBA objects, while the TDL lan- 
guage permits them to define trading contracts. The 
OMG IDL language is compiled to produce commu- 
nication stubs, or to feed the Interface Repository. 
Similarly, the TDL language is compiled to gener- 
ate trading proxies, or to feed the trading contract 
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repository. CORBA stubs rely on an ORB run- 
time, encapsulating the GIOP/IIOP protocol, while 
TORBA proxies rely on the TORBA runtime hiding 
the Cos'Trading service, as well as the ORB. 


The component-oriented approach is the last source 
of inspiration of the TORBA proposal. As an ex- 
ample, the CORBA Component Model [18] defines 
a component as being a software entity providing 
multiple interfaces (or facets). Each facet is a point 
of view on the component, which logically defines a 
set of operations. In that, TORBA provides access 
to the generic ODP/OMG trading service through 
facets dedicated to application requirements. Each 
lookup proxy generated is a dedicated facet being a 
point of view on the trading service as depicted in 
Figure 13. 







Printer : 
Tradi Generic Dowieg 
re0Me : Trading 
Facet Trading 
: Facet 
|_ Service 


Trading Facet 
Figure 13: TORBA, towards a ‘componentized’ 
trading service. 


6 Conclusion 


First, this paper has reviewed the ODP/OMG 
CosTrading service. This review has presented the 
use of the service as being very technical and com- 
plex due to the lack of a structured approach. The 
various drawbacks brought up by the lack of type- 
checking at compilation time have been underlined. 
Then, the lack of formalism to define offer types and 
search operations has been presented as being one 
of the reasons of the service complexity. 


Then, TORBA has been presented as a frame- 
work structuring the ODP/OMG trading service 
use. The conceptual contribution of this paper re- 
lies on the definition of the trading contract concept 
as a paradigm to structure the trading activity. The 
benefits of the TDL formalism use and its associated 
tools have been discussed. Using an example, the 
benefits of TORBA have been illustrated in terms 
of type checking, simplicity, productivity, and relia- 


bility of applications. 


All the elements depicted in this paper have been 
prototyped and experiments have been performed 
using IDLscript and Java languages, as well as the 
ORBacus trading service [20]: TDL compilers (BNF 
and XML versions), proxy generators (OMG IDL, 
OMG IDLscript, and Java), runtime environments 
for IDLscript and Java, the trading contract reposi- 
tory, as well as the TORBA explorer are already op- 
erational. The next step is to finalize the TORBA 
environment in order to release it, and to obtain 
experiment/use feedback from end-users. 


From now on, we have lots of work in view around 
TORBA: (1) support of C++ applications, (2) ex- 
periments over other CosTrading implementations, 
(3) measure of the overhead implied by TORBA 
proxies, (4) experiments of iterators, dynamic prop- 
erties, and lookup strategies, (5) extension towards 
asynchronous trading (notification to applications of 
newly exported offers), and (6) use of the TORBA 
approach in the context of Jini, trading serialized 
objects and not only references. 


In the meantime, TORBA is part of our actual re- 
search work. We intend to use TORBA in order to 
experiment the concept of Component Oriented 
Trading (COT) [27]. In that, TORBA would be- 
come the basis of TOSCA (Trading Oriented System 
for Component-based Applications), whose goal is 
to provide an environment to deploy and to adminis- 
trate distributed component based applications [12]. 


Finally, in a more ambitious vision, we intend 
to consider the benefits of a language to perform 
queries and to act upon distributed objects. The 
goal would be to unify search operations on trad- 
ing services, object-oriented databases, and object 
environments a la JavaSpaces [25]. This language 
could be named TORBA Query Language, rely- 
ing upon the following equation: 


TQL=TDL+O0CL+ O0QL+ IDLscript 
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Abstract 


Component technology promotes code-reuse by en- 
abling the construction of complex applications 
by assembling off-the-shelf components. However, 
components depend on certain characteristics of the 
environment in which they execute. They depend 
on other software components and on hardware re- 
sources. 


In existing component architectures, the application 
developer is left with the task of resolving those de 
pendencies, i.e., making sure that each component 
has access to all the resources it needs and that 
all the required components are loaded. Neverthe- 
less, according to encapsulation principles, develop- 
ers should not be aware of the component internals. 
Thus, it may be difficult to find out what a compo- 
nent really needs. In complex systems, this manual 
approach to dependency management can lead to 
disastrous results. 


In this paper, we propose an integrated archi- 
tecture for managing dependencies in distributed 
component-based systems in an effective and uni- 
form way. The architecture supports automatic con- 
figuration and dynamic resource management in dis- 
tributed heterogeneous environments. We describe 
a concrete implementation of this architecture and 
present experimental results. 
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1 Introduction 


As computer systems are being applied to more and 
more aspects of personal and professional life, the 
quantity and complexity of software systems is in- 
creasing considerably. At the same time, the di- 
versity in hardware architectures remains large and 
is likely to grow with the deployment of embedded 
systems, PDAs, and portable computing devices. 
All these platforms will coexist with personal com- 
puters, workstations, computing servers, and super- 
computers. The construction of new systems and 
applications in an easy and reliable way can only be 
achieved through the composition of modular hard- 
ware and software. 


Component technology has appeared as a powerful 
tool to confront this challenge. Recently developed 
component architectures support the construction 
of sophisticated systems by assembling together a 
collection of off-the-shelf software components with 
the help of visual tools or programmatic interfaces. 
Components will be the unit of packaging, distri- 
bution, and deployment in the next generation of 
software systems. However, there is still very lit- 
tle support for managing the dependencies among 
components. Components are created by different 
programmers, often working in different groups with 
different methodologies. It is hard to create robust 
and efficient systems if the dependencies between 
components are not well understood. 


Until recently, highly-dynamic environments with 
mobile computers, active spaces, and ubiquitous 
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multimedia were only present in science fiction sto- 
ries or in the minds of visionary scientists like Mark 
Weiser [Wei92]. But now, they are becoming a re- 
ality and one of the most important challenges they 
pose is the proper management of dynamism. Fu- 
ture computer systems must be able to configure 
themselves dynamically, adapting to the environ- 
ment in which they are executing. Furthermore, 
they must be able to react to changes in the envi- 
ronment by dynamically reconfiguring themselves to 
keep functioning with good performance, irrespec- 
tive of modifications in the environment. 


Unfortunately, the existing software infrastructure 
is not prepared to manage these highly-dynamic en- 
vironments properly. 


Existing component-based systems face significant 
problems with reliability, administration, architec- 
tural organization, and configuration. The problem 
behind all these difficulties is the lack of a unified 
model for representing dependencies and mecha- 
nisms for dealing with these dependencies. Compo- 
nents depend on hardware resources (such as CPU, 
memory, and special devices) and software resources 
(such as other components, services, and the operat- 
ing system). Not resolving these dependencies prop- 
erly compromises system efficiency and reliability. 


As systems become more complex and grow in scale, 
and as environments become more dynamic, the 
effects of the lack of proper dependence manage- 
ment become more dramatic. Therefore, we need 
an integrated approach in which operating systems, 
middleware, and applications collaborate to manage 
the components in complex software systems, deal- 
ing with their hardware and software dependencies 


properly. 


Software is in constant evolution and new compo- 
nent versions are released frequently. How can one 
run the most up-to-date components and make sure 
that they work together in harmony? This requires 
mechanisms for (1) code distribution over wide-area 
networks so we can push or pull new components 
as they become available and (2) safe dynamic re- 
configuration so we can plug new components when 
desired. 


In previous papers, we introduced a model for repre- 
senting dependencies in distributed component sys- 
tems [KC99] and described a reflective ORB that 
supports dynamic component loading in distributed 
environments (KRL*+00, KGAt00]. In this paper, 


we extend our previous work by describing the de- 
sign, implementation, and performance of an inte- 
grated architecture that provides mechanisms for: 


1. Automatic configuration of component-based 
applications. 


2. Intelligent, dynamic placement of applications 
in the distributed system. 


3. Dynamic resource management for distributed 
heterogeneous environments. 


4. Component code distribution using push and 
pull methods. 


5. Safe dynamic reconfiguration of distributed 
component systems. 


1.1 Paper Contents 


Section 2 gives a general overview of our architec- 
ture for automatic configuration and dynamic re- 
source management. Section 3 details the automatic 
configuration mechanisms, explaining the concepts 
of prerequisites (Section 3.1), component configu- 
rators (Section 3.2), and the Automatic Configura- 
tion Service (Section 3.3). Section 4 describes the 
Resource Management Service, addressing resource 
monitoring (Section 4.1) resource reservation (Sec- 
tion 4.2), application execution (Section 4.3), and 
fault-tolerance and scalability (Section 4.4). 


Section 5 gives additional implementation details 
and present experimental results. We then present 
related work (Section 6), future work (Section 7), 
and our conclusions (Section 8). 


2 Architectural Framework 


To deal with the highly-dynamic environments of 
the next decades, we propose an architectural 
framework divided in three parts. First, a mech- 
anism for dependence representation lets develop- 
ers specify component dependencies and write soft- 
ware that deals with these dependencies in cus- 
tomized ways. Second, an Automatic Configuration 
Service is responsible for dynamically instantiating 
component-based applications by analyzing and re- 
solving their component dependencies at runtime. 
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A Resource Management Service is responsible for 
managing the hardware resources in the distributed 
system, exporting interfaces for inspecting, locating, 
and allocating resources in the distributed, hetero- 
geneous system. 


Figure 1 presents a schematic view of the major el- 
ements of our architecture. Prerequisite specifica- 
tions reify static dependencies of components to- 
wards its environment while component configurae- 
tors reify dynamic, runtime dependencies. 
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Figure 1: Architectural Framework 


As we explain in Section 3, the automatic configu- 
ration process is based on the prerequisite specifica- 
tions and constructs the component configurators. 
As the Automatic Configuration Service instanti- 
ates new components, it uses the Resource Manage- 
ment Service to allocate resources for them. At ex- 
ecution time, changes in resource availability may 
trigger call-backs from the Resource Management 
Service to component configurators so that compo- 
nents can adapt to significant changes in the under- 
lying environment. 


As described in Section 4.3, when a client requests 
the execution of an application to the Resource 
Management Service, the latter finds the best lo- 
cation to execute the application and then uses the 
Automatic Configuration Service to load the appli- 
cation components. 


The elements of the architecture are exported as 
CORBA services and their implementation relies 
on standard CORBA services such as Naming and 
Trading [OMG98]. 


We have employed the architecture presented here 
to support a reliable, dynamically configurable Mul- 
timedia Distribution System. Readers interested in 
a detailed description of how our services were used 
in that particular application scenario should refer 
to [KCNO0]. In the following sections, we provide a 
more in-depth description of each of the elements of 
the architecture. 


3 Automatic Configuration 


Software systems are evolvmg more rapidly than 
ever before. Vendors release new versions of web 
browsers, text editors, and operating systems once 
every few months. System administrators and users 
of personal computers spend an excessive amount of 
time and effort configuring their computer accounts, 
installing new programs, and, above all, struggling 
to make all the software work together. 


In environments like MS-Windows, the installation 
of some applications is partially automated by “wiz- 
ard” interfaces that direct the user through the in- 
stallation process. However, it is common to face 
situations in which the installation cannot complete 
or in which it completes but the software package 
does not run properly because some of its (unspeci- 
fied) requirements are not met. In other cases, after 
installing a new version of a system component or a 
new tool, applications that used to work before the 
update, stop functioning. It is typical that applica- 
tions on MS- Windows cannot be cleanly uninstalled. 
Often, after executing special uninstall procedures, 
“junk” libraries and files are left in the system. The 
application does not know if it can remove all the 
files it has installed because the system does not 
provide the clear mechanisms to specify which ap- 
plications are using which libraries. 


To solve this problem, we need a completely new 
paradigm for installing, updating, and removing 
software from workstations and personal comput- 
ers. We propose to automate the process of soft- 
ware maintenance with a mechanism we call Auto- 
matic Configuration. In our design of an automatic 
configuration service for modern computer environ- 
ments, we focus on two key objectives: 


1 When even PhD students in Computer Science have trou- 
ble keeping their commodity personal computers functioning 
properly, one can notice that something is very wrong in the 
way that commercial software is built nowadays. 
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1. Network-Centrism and 


2. a “What You Need Is What You Get” (WYNI- 
WYG) model. 


Network-Centrism refers to a model in which all en- 
tities, users, software components, and devices exist 
in the network and are represented as distributed 
objects. Each entity has a network-wide identity, 
a network-wide profile, and dependencies on other 
network entities. When a particular service is con- 
figured, the entities that constitute that service are 
assembled dynamically.’ Users no longer need to 
keep several different accounts, one for each device 
they use. In the network-centric model, a user has a 
single network-wide account, with a single network- 
wide profile that can be accessed from anywhere in 
the distributed system. The middleware is responsi- 
ble for instantiating user environments dynamically 
according to the user’s profile, role, and the under- 
lying platform [CKB*00]. 


In contrast to existing operating systems, middle- 
ware, and applications where a large number of non- 
utilized modules are carried along with the standard 
installation, we advocate a What You Need Is What 
You Get model, or WYNIWYG. In other words, 
the system should configure itself automatically and 
load a minimal set of components required for ex- 
ecuting the user applications in the most efficient 
way. The components are downloaded from the net- 
work, so only a small subset of system services are 
needed to bootstrap a node. 


In the Automatic Configuration model, system 
and application software are composed of network- 
centric components, i.e., components available for 
download from a Component Repository present in 
the network. Component code is encapsulated in 
dynamically loadable libraries (DLLs in Windows 
and shared objects in Unix), which enables dynamic 
linking. 


Each application, system, or component? specifies 
everything that is required for it to work prop- 
erly (both hardware and software requirements). 
This collection of requirements is called Prerequisite 
Specifications or, simply, Prerequisites. 


“ 


?From now on, we use the term “component” not only to 
refer to a piece of an application or system but also to refer to 
the entire application or system. This is consistent since, in 
our model, applications and systems are simply components 
that are made of smaller components. 
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3.1 Prerequisites 


The prerequisites for a particular inert component 
(stored on a local disk or on a network component 
repository) must specify any special requirements 
for properly loading, configuring, and executing that 
component. Weconsider three different kinds of in- 
formation that can be contained in a list of prereq- 
uisites. 


1. The nature of the hardware resources the com- 
ponent needs. 


2. The capacity of the hardware resources it 
needs. 


3. The software services (i.e., other components) 
it requires. 


The first two items are used by the Resource Man- 
agement Service to determine where, how, and when 
to execute the component. QoS-aware systems can 
use these data to enable proper admission control, 
resource negotiation, and resource reservation. The 
last item determines which auxiliary components 
must be loaded and in which kind of software en- 
vironment they will execute. 


The first two items — reminiscent of the Job Control 
Languages of the mid-1960s — can be expressed by 
modern QoS specification languages such as QML 
{FK99b] and QoS aspect languages [LBS*98], or 
by using a simpler format such as SPDF (see Sec- 
tion 3.4.1). The third item is equivalent to the re- 
quire clause in architectural description languages 
like Darwin [MDK94] and module interconnection 
languages like the one used in Polylith [Pur94]. 


The prerequisites are instrumental in implementing 
the WYNIWYG model as they let the system know 
what the exact requirements are, for instantiating 
the components properly. If the prerequisites are 
specified correctly, the system not only loads all the 
necessary components to activate the user environ- 
ment, but also loads a minimal set of components 
required to achieve that. 


We currently rely on the component programmer 
to specify component prerequisites. Mechanisms 
for automating the creation of prerequisite speci- 
fications and for verifying their correctness require 
further research and are beyond the scope of this pa- 
per. Another interesting topic for future research is 


USENIX Association 


the refinement of prerequisites specifications at run- 
time according to what the system can learn from 
the execution of components in a certain environ- 
ment. This can be achieved by using QoS profiling 
tools such as QualProbes [LNO0]. 


3.2. Component Configurator 


The explicit representation of dynamic dependen- 
cies is achieved through special objects attached to 
each relevant component at execution time. These 
objects are called component configurators; they are 
responsible for reif ying the runtime dependencies for 
a certain component and for implementing policies 
to deal with events coming from other components. 


While the Automatic Configuration Service parses 
the prerequisite specifications, fetches the required 
components from the Component Repository, and 
dynamically loads their code into the system run- 
time, it uses the information in the prerequisite 
specifications to create component configurators 
representing the runtime inter-component depen- 
dencies. Figure 2 depicts the dependencies that a 
component configurator reifies. 


depends on 







...9 COMPONENT 
_-x CONFIGURATOR 


eazmera 
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Figure 2: Reification of Component Dependencies 


The dependencies of a component C' are managed 
by a component configurator C‘. Each configurator 
C* has a set of hooks to which other configurators 
can be attached. These are the configurators for the 
components on which C depends; they are called 
hooked components. The components that depend 
on C are called clients; C° also keeps a list of refer- 
ences to the clients’ configurators. In general, every 
time one defines that a component C’ depends on a 
component C2, the system should perform two ac- 


tions: 


1. attach C$ to one of the hooks in Cf and 


2. add Cf to the list of clients in C3. 


Component configurators are also responsible for 
distributing events across the inter-dependent com- 
ponents. Examples of common events are the failure 
of a client and destruction, internal reconfiguration, 
or replacement of the implementation of a hooked 
component. The rationale is that such events af- 
fect all the dependent components. The component 
configurator is the place where programmers must 
insert the code to deal with these configuration- 
related events. 


Component developers can program specialized ver- 
sions of component configurators that are aware of 
the characteristics of specific components. These 
specialized configurators can, therefore, implement 
customized policies to deal with component depen- 
dencies in application-specific ways. 


As an example of how customized component con- 
figurators could help applications, consider a QoS- 
sensitive video-on-demand client that reserves a por- 
tion of the local CPU for decoding a video stream. 
The application developer can program a special 
configurator that registers itself with the Resource 
Management Service. In this way, when the Re- 
source Management Service detects a change in re- 
source availability that would prevent the applica- 
tion from getting the desired level of service, it no- 
tifies the configurator (as shown in Figure 1). The 
configurator, with its customized knowledge about 
the application, sends a message to the video server 
requesting that the latter decrease the video frame 
rate. Then, with a lower frame rate, the client is 
able to process the video while the limited resource 
availability persists. When the resources go back 
to normal, another notification allows the video-on- 
demand configurator to re-establish the initial level 
of service. 


3.3 Automatic Configuration Service 


As described above, automatic configuration en- 
ables the construction of network-centric systems 
following a WYNIWYG model. To experiment with 
these ideas, we developed an Automatic Configura- 
tion Service for the 2K operating system [KCM*00]. 
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Different applications domains may have different 
ways of specifying the prerequisites of their appli- 
cation components. Therefore, rather than limit- 
ing the specification of prerequisites to a particu- 
lar language, we built the Automatic Configuration 
Service as a framework in which different kinds of 
prerequisite descriptions can be utilized. To vali- 
date the framework, we designed the Simple Pre- 
requisite Description Format (SPDF), a very sim- 
ple, text-based format that allowed us to perform 
initial experiments. In the future, other more elab- 
orated prerequisite formats including sophisticated 
QoS descriptions [FK99b, LBS*98] can be plugged 
into the framework easily. 


In addition, depending upon the dynamic availabil- 
ity of resources and connectivity constraints, differ- 
ent algorithms for prerequisite resolution may be de- 
sired. For example, if a diskless PDA isconnected to 
a network through a 2Mbps wireless connection, it 
will be beneficial to download all the required com- 
ponents from a central repository each time they 
are needed. On the other hand, if a laptop com- 
puter with a large disk connects to the network via 
modem, it will probably be better to cache the com- 
ponents in the local disk and re-use them whenever 
is possible. 


Figure 3 shows how the architecture uses the 
two basic classes of the Automatic Configuration 
framework: prerequisite parsers and prerequisite re- 
solvers. Administrators and developers can plug dif- 
ferent concrete implementations of these classes to 
implement customized policies. 
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Figure 3: Automatic Configuration Framework 





The automatic configuration process works as fol- 
lows. First, the client sends a request for loading an 
application by passing, as parameters, the name of 


the application’s “master” component and a refer- 
ence to a component repository (step 1 in Figure 3). 
The request is received by the prerequisite resolver, 
which fetches the component code and prerequisite 
specification from the given repository, or from a lo- 
cal cache, depending on the policy being used (step 
2.1). 


Next, the prerequisite resolver calls the prerequisite 
parser to process the prerequisite specification (step 
2.2). As it scans the specification, the parser issues 
recursive calls to the prerequisite resolver to load 
the components on which the component being pro- 
cessed depends (step 2.3). This may trigger several 
iterations over steps 2.1, 2.2, and 2.3. 


After all the dependencies of a given component are 
resolved, the parser issues a call to the Resource 
Manager to negotiate the allocation of the required 
resources (step 3). After all the application compo- 
nents are loaded, the service returns a reference to 
the new application to the client (step 4). 


3.4 A Concrete Implementation 


To evaluate the framework, we created concrete 
implementations of the prerequisite parser and re- 
solver. The prerequisite parser, called SPDFParser, 
processes SPDF specifications. The first prerequi- 
site resolver, called SimpleResolver, uses CORBA 
to fetch components from the 2K Component 
Repository. The second, called CachingResolver, 
is a subclass of SimpleResolver that caches the 
components on the local file system. 


3.4.1 SPDF 


We designed the Simple Prerequisite Description 
Format (SPDF) to serve as a proof-of-concept for 
our framework. An SPDF specification is divided in 
two parts, the first is called hardware requirements 
and the second, software requirements. Figure 4 
shows an example of an SPDF specification for a 
hypothetical web browser. The first part specifies 
that this application was compiled for a Sparc ma- 
chine running Solaris 2.7, that it requires at least 
5MB of RAM memory but that it functions opti- 
mally with 40 MB of memory, and that it requires 
10% of a CPU with speed higher than 300MHz. 


The second part, software requirements, specifies 
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:hardware requirements 


machine_type SPARC 
os_name Solaris 
os_version 2: 
min_ram 5MB 
optimal_ram 40MB 
cpu_speed >300MHz 
cpu_share 10% 


:software requirements 

FileSystem CR:/sys/storage/DFS1.0 (optional) 
TCPNetworking CR:/sys/networking/BSD-sockets 
WindowManager CR:/sys/WinManagers/simpleWin 

| SVM CR: /interp/Java/jvm1.2 (optional) 





Figure 4: A Simple Prerequisite Description 


that the web browser requires four components (or 
services): a file system (to use as a local cache 
for web pages), a TCP networking service (to fetch 
the web pages), a window manager (to display the 
pages), and a Java virtual machine (to interpret 
Java Applets). 


The first line in the software requirements sec- 
tion specifies that the component that implements 
the file system (or the proxy that interacts with 
the file system) can be located in the directory 
/sys/storage/DFS1.0 of the component reposi- 
tory (CR). It also states that the file system is an 
“optional” component, which means that the web 
browser can still function without a cache. Thus, if 
the Automatic Configuration Service is not able to 
load the file system component, it simply issues a 
warning message and continues its execution. 


3.4.2 Simple Resolver and 
Caching Resolver 


The SimpleResolver fetches the component im- 
plementations and component prerequisite specifi- 
cations from the 2K Component Repository. It 
stores the component code in the local file system 
and dynamically links the components to the sys- 
tem runtime. As new components are loaded, they 
are attached to hooks in the component configu- 
rator of the parent component, i.e., the compo- 
nent that required it. In the web browser exam- 
ple, the SimpleResolver would add hooks to the 
web browser configurator, call them FileSysten, 
TCPNetworking, WindowManager, and JVM, and at- 
tach the respective component configurators to each 


of these hooks. 


Resolvers can be extended using inheritance. For 
example, with very little work, we extended the 
SimpleResolver to create a Cachi ngResolver that 
checks for the existence of the component in the lo- 
cal disk (cache) before fetching it from the remote 
repository. 


3.5 Simplifying Management 


The Automatic Configuration Service simplifies 
management of user environments in distributed 
systems greatly. Whenever a new application is re- 
quested, the service downloads the most up-to-date 
version of its components from the network Com- 
ponent Repository and installs them locally. This 
provides several advantages including the following. 


e It eliminates the need to upload components to 
the entire network each time a component is 
updated. 


e It eliminates the need to keep track manually of 
which machines hold copies of each component 
because updates are automatic. 


e It helps machines with limited resources, which 
no longer need to store all components locally. 


3.6 Pushing Component Updates 


The automatic configuration mechanism described 
here provides a pull-based approach for code up- 
dates and configuration. In other words, the service 
running in a certain network node takes the initia- 
tive to pull the code and configuration information 
from a Component Repository. 


To support efficient and scalable management in 
large-scale systems, it may be desirable to allow 
system administrators to push code and configura- 
tion information into the network. Our architecture 
achieves this by using the concept of mobile recon,fig- 
uration agents, which we describe in detail elsewhere 
[KGAT 00]. 
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4 Resource Management Service 


The Resource Management Service [Yam00] is or- 
ganized as a collection of CORBA servers that are 
responsible for (1) maintaining information about 
the dynamic resource utilization in the distributed 
system, (2) locating the best candidate machine to 
execute a certain application or component based 
on its QoS prerequisites, and (3) allocating local re- 
sources for particular applications or components. 


As shown in Figure 5, the Resource Management 
Service relies on Local Resource Managers (LRMs) 
present in each node of the distributed system. The 
LRM’s task is to export the hardware resources 
of a particular node to the whole network. The 
distributed system is divided in clusters and each 
cluster is managed by a Global Resource Manager 
(GRM). 


a | ( 


Figure 5: Resource Management Service 





4.1 Resource Monitoring 


The LRMs running in each network node send up- 
dates of the state of their resources (e.g., CPU and 
memory usage) to the GRM periodically. The GRM 
implementation encompasses an instance of the 
standard OMG Object Trading Service [OMG98]. A 
reference to the LRM of each machine in the cluster 
is stored in the GRM database as a trader “service 
offer” and the state of its resources is stored as the 
offer’s “properties”. 


To reduce network and GRM load, it is important 
to limit the frequency in which the LRMs send their 
updates to the GRM. Thus, although LRMs check 
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the state of their local resources frequently (e.g., 
every ten seconds), they only send this information 
to the GRM when (1) there were significant changes 
in resource utilization since the last update (e.g., a 
variation in more than 20% on the CPU load) or 
(2) a certain time has passed since the last update 
was sent (e.g., three minutes). In addition, when a 
machine leaves the network, in case of ashutdown or 
a voluntary disconnection of a mobile computer, the 
LRM unregisters itself from the GRM database. If 
the GRM does not receive an update from an LRM 
for a period twice as long as the time in item 2 
above, it assumes that the machine with that LRM 
is unaccessible. 


4.2 Resource Reservation 


The LRMs are also responsible for performing QoS- 
aware admission control, resource negotiation, reser- 
vation; and scheduling of tasks on a single node. 
This is achieved with the help of a Dynamic Soft 
Real-Time Scheduler [NhCN98] that runs as a user- 
level process in conventional operating systems like 
Solaris and Windows. The LRM works as aCORBA 
wrapper for this scheduler, which uses the system’s 
low-level real-time API to provide QoS guarantees 
to applications with soft real-time requirements. 


This CORBArized scheduler can be used at any 
time by CORBA clients to request QoS guarantees 
on the availability of CPU and memory. For ex- 
ample, as explained in Section 3.3, a prerequisite 
parser may issue requests to reserve CPU and mem- 
ory based on a component’s hardware prerequisite 
specifications. 


4.3. Executing Applications 


Both the LRM and the GRM export an interface 
that let clients execute applications (or components) 
in the distributed system. The GRM maintains an 
approximate view of the cluster resource utilization 
state and it uses this information as a hint for per- 
forming QoS-aware load distribution within its clus- 
ter. 


When a client wishes to execute a new application, 
it sends an execute_application request to the lo- 
cal LRM. The LRM checks whether the local ma- 
chine has enough resources to execute the applica- 
tion comfortably. If not, it forwards the request to 
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the GRM. The latter uses its information about the 
resource utilization in the distributed system to se- 
lect a machine that would be the best candidate to 
execute that application and forwards the request, 
aS a oneway message, to the LRM of that machine. 
The LRM of the latter machine tries to allocate the 
resources locally, if it is successful, it sends a oneway 
ACK message to the client LRM. If it is not possible 
to allocate the resources on that machine, it sends 
a NACK back to the GRM, which then looks for an- 
other candidate machine. If the GRM exhausts all 
the possibilities, it returns an empty offer to the 
client LRM. 


When the system finally locates a machine with the 
proper resources, it creates a new process to host 
the application. Next, it uses the Automatic Con- 
figuration Service to fetch all the necessary compo- 
nents (i.e. the master component’s dependencies) 
from the Component Repository and dynamically 
load them into that process as described in Section 
3.3, 


4.3.1 Client Request Format 


The format of the client request to the initial LRM 
is the following. 


CosTrading: :O0fferSeq execute_application ( 
in string categoryName, 
in string componentName, 
in string args, 
in CosTrading::PropertySeq QoS_spec, 


in CosTrading::Constraint platform_spec, 

in CosTrading::Preference prefs, 

in CosTrading: : Lookup: :SpecifiedProps 
return_props 





categoryName/componentName specify which of the 
components in the 2K Component Repository is the 
master component of the application to be executed 
and args contains the arguments that should be 
passed to it at startup time. 


QoS_spec defines the quality of service required 
for this application. It is specified as a list of 
<resourceName ,resourceValue> pairs. As an ex- 
ample, if the resource is the CPU, then the resource 
value should be a structure of the following format 
(specified by the scheduler’s CPU server [NhCN98}). 


struct CpuReserve { 
long serviceClass; 
long period; 
long peakProcessingTime; 


long sustaina»leProcessingTime; 
long burstTolerance; 
float peakProcessingUtil; 





platform_spec is the criteria to select a clus- 
ter node and it is specified using the OMG 
Trader Constraint Language. For example, 
(os_name == ’Linux’) and (processor_util < 40) 
will select a Linux machine whose CPU utilization 
is less than 40%. 


prefs specifies the preferred machine in case mul- 
tiple machines satisfy the requirements. For exam- 
ple, max (RAM_free) will select the machine with the 
maximum available physical memory. 


Finally, return_props specifies which properties 
(resource utilization information) should be in- 
cluded in the service offer that is returned. The 
returned value also includes a reference to the com- 
ponent configurator (see Section 3.2) of the new ap- 
plication. 


4.4 Fault-Tolerance and Scalability 


To provide fault-tolerance and scalability, the Re- 
source Management Service architecture depends 
on a collection of replicated GRMs in each cluster. 
LRMs send their updates as a multicast message to 
all the GRMs in the cluster. Since, strong consis- 
tency between the GRMs is not required, we can use 
an unreliable multicast mechanism. Client requests 
are sent to a single GRM and different clients may 
use different GRMs for load balancing. 


To enhance scalability across multiple clusters con- 
nected through the Internet, GRMs can be feder- 
ated in a hierarchical way. If a request cannot be 
resolved in a particular cluster, the GRM forwards it 
to a parent GRM inthe hierarchy. The parent GRM 
maintains an approximate view of the resource uti- 
lization in its child clusters and uses this informa- 
tion as a hint to locate a proper cluster to fulfill the 
client request. 


Although we have designed the protocols and algo- 
rithms for fault-tolerance and scalability mentioned 
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in this subsection, their implementation is still un- 
derway. 


5 Implementation and 
Experimental Results 


The Automatic Configuration Service is imple- 
mented as a library that can be linked to any ap- 
plication. A program enhanced with this service 
becomes capable of fetching components from a re- 
mote Component Repository and dynamically load- 
ing and assembling them into its local address-space. 
The library requires only 157Kbytes of memory on 
Solaris 7, which makes it possible to use it even on 
machines with limited resources such as a PalmPi- 
lot. In fact, we expect that services similar to this 
will be extensively used in future mobile systems to 
configure software automatically according to loca- 
tion and user requirements. 


To evaluate the performance of the Automatic Con- 
figuration Service, we instrumented a test applica- 
tion [KCNOO] to measure the time for fetching, dy- 
namic linking, and configuring its constituent com- 
ponents. 


5.1 Loading Multiple Components 


Figure 6 shows the total time for the service to load 
from one to eight components of 19.2Kbytes each. 
These experiments were carried out on two Sparc 
Ultra-60 machines running Solaris 7 and connected 
by a 100Mbps Fast Ethernet network. The Com- 
ponent Repository was executed on one of the ma- 
chines and the test application with the Automatic 
Configuration Service on the other. Each value is 
the arithmetic mean of five runs of the experiment. 
The vertical bars in the subsequent graphs and the 
numbers in parentheses in Table 1 represent the 
standard deviation. As the graph shows, the varia- 
tion in execution times across different runs of the 
experiment was very small. 


Table 1 shows, in more detail, how the service 
spends its time when loading a single 19.2Kbyte 
component. The current version of the Automatic 
Configuration Service fetches the prerequisites file 
from the remote Component Repository and saves 
it to the local disk. The same is done with the file 


100 


80 


60 


Total Time (ms) 
s 


20 


0 ] 2 Bi. (QS Be 6). -F, 8 
Number of Components 


Figure 6: Automatic Configuration Service Perfor- 
mance 


containing the component code. Then, it uses the 
underlying operating system to perform the local 
dynamic linking of the component into the process 
runtime. 


The table also shows the additional time spent by 
the service (row labeled as “autoconf protocol addi- 
tional operations”) to detect if there are more com- 
ponents or prerequisite files to load, to parse the 
prerequisite file, and to reify dependencies. This 
overhead accounts for 46% of the total time required 
to load the component, which suggests that it would 
be desirable to improve this part of the service by 
optimizing the implementation of the SimpleResolver 
(see Section 3.4). We believe that an optimized ver- 
sion of the SimpfeResoflver could lead to improve- 
ments in the order of 20% for components of this 
size. 


In the experiments described in this section, the 
component code and prerequisite files were cached 
in the memory of the machine executing the Compo- 
nent Repository. When the Component Repository 
program needs to read both files from its local disk, 
there is an additional overhead of approximately 20 
milliseconds. 


5.2 Components of Different Sizes 


To evaluate how the time for loading a single com- 
ponent varies with the component size, we created 
a program that generates components of different 
sizes. According to its command-line arguments, 
this program generates C++ source code containing 
a given number of functions (which include code to 
perform simple arithmetic operations) and local and 
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global variables. Using this program, we created 
components whose DLL sizes vary from 12 to 115 
Kbytes. Figure 7 shows the time for the Automatic 
Configuration Service to load a single component as 
the component size increases. 
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Figure 7: Times for Loading Components of Differ- 
ent Sizes 


Figure 8 shows the absolute times spent in each step 
of the process*. We can notice that the time spent 
in the item labeled “autoconf protocol” is approx- 
imately constant*. Hence, as the component size 
increases, its relative contribution to the total time 
decreases. This can be noticed in Figure 9, which 
shows the same data in a different form. In this 
case, the figure shows the percentage of the total 
time spent in each of the steps of the process. 


As the size of the component increases, the time for 
fetching the code from the remote repository to the 
local machine becomes the dominant factor. It is 
important to remember that these data were cap- 
tured in a fast local network. If the access to the 
repository requires the use of a lower bandwidth 
connection, then this step would clearly be the most 


3These steps are the same as those presented in Table 1. 

4This is expected since the messages processed in this step 
do not carry component code and therefore are not affected 
by the size of the component. 


(0 
4(0 
5-(0 21 


) 

saving prerequisites to local disk 

fetching component from Component Repository 
) 
) 


autoconf protocol additional operations 


24 (0.7) 






Bloading code 
B saving code 
Dfetching code 


Time (ms) 


S saving prereq 
Bfelching prereq 
@ autoconf protocol 





Component Size (Kbytes) 


Figure 8 Discriminated Times for Loading Compo- 
nents of Different Sizes 
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Figure 9: Discriminated Percentual Times for Load- 
ing Components of Different Sizes 


important with respect to performance. This sug- 
gests that deriving intelligent algorithms for compo- 
nent caching, taking component versions and user 
access patterns into consideration is an important 
topic for future research. 


Although there is still much room for improvements 
and performance optimizations in the protocols used 
by the Automatic Configuration Service, the results 
presented here are very encouraging. They demon- 
strate that it is possible to carry out automatic con- 
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figuration of a distributed component-based appli- 
cation within a tenth of a second, which is what we 
intended to prove. 


5.3 Resource Management 


We have not yet carried out an extensive perfor- 
mance evaluation of the Resource Management Ser- 
vice. However, preliminary results [Yam00] show 
that the overhead imposed by the LRMs in the in- 
dividual nodes is low and that the time for launching 
a simple remote application through the GRM and 
LRM is in the order of a tenth of a second. As fu- 
ture work, we intend to carry out a comprehensive 
evaluation of this service. 


6 Related Work 


The OMG CORBA Component Model (CCM) spec- 
ifies a standard framework for building, packag- 
ing, and deploying CORBA components [OMG99]. 
Unlike our model, which focuses on prerequisites 
and dynamic dependencies, the CORBA Compo- 
nent Model concentrates on defining an XML vo- 
cabulary and an extension to the OMG IDL to sup- 
port the specification of component packaging, cus- 
tomization, and configuration. The CCM Software 
Package Descriptor is reminiscent of our SPDF as 
it contains a description of package dependencies, 
i.e., a list of other packages or implementations that 
must be installed in the system for a certain package 
to work. CORBA Component Descriptors, on the 
other hand, describe the interfaces and event ports 
used and provided by a CORBA component. 


We believe that our model and CCM complement 
each other and could be integrated. CCM provides 
a static description of component needs and inter- 
actions, while our model manages the runtime dy- 
namics. Although CCM was already approved by 
OMG, publicly available ORBs do not support it 
yet. Once this happens, we intend to work towards 
the integration of the two models. 


Among the major CORBA implementations, the 
one that most resembles our work is Orbix 2000 
[IONOO]. Its Adaptive Runtime Architecture lets 
users add functionality to the ORB by loading plug- 
ins dynamically. Whenever a request is sent to the 


ORB, it is processed by a chain of interceptors that 
can be configured in different ways using the loaded 
plug-ins. In that way, the ORB can be configured 
with interceptors that implement security, transac- 
tions, different transport protocols, etc. When the 
ORB loads a plug-in, it checks its version and de- 
pendence information. A centralized configuration 
repository specifies plug-in availability and configu- 
ration settings. Using this architecture it could be 
relatively easy to implement the functionality pro- 
vided by our Automatic Configuration and Resource 
Management Services. 


Enterprise JavaBeans [Tho98] is a server-side tech- 
nology for the development of component-based sys- 
tems. It does not support the functionality for Au- 
tomatic Configuration and Resource Management 
provided in our system. Nevertheless, it provides 
deployment descriptors that let one define, at de- 
ployment time, the configuration of individual com- 
ponents (Beans). Instead of recording the configu- 
ration information in a text format — like our SPDF 
and CORBA’s XML formats — deployment descrip- 
tors are serialized Java classes. A deployment de- 
scriptor can customize the behavior of a Bean by 
setting environment properties as well as define run- 
time attributes of its execution context, such as se- 
curity, transactions, persistence, etc. [MH00]. 


Jini is a set of mechanisms for managing dynamic 
environments based on Java. It provides protocols 
to allow services to join a network and discover 
what services are available in this network. It also 
defines standard service interfaces for leasing, trans- 
actions, and events [AOS*99]. When a Jini server 
registers itself with the Jini lookup service, it stores 
a piece of Java byte code, called proxy, in its entry in 
the lookup service. When a Jini-enabled client uses 
the lookup service to locate the server, it receives, 
as a reply, a Serviceltem, which is composed of a ser- 
vice ID, the code for the proxy, and a set of service 
attributes. The proxy is then linked into the client 
address-space and is responsible for communication 
with the server. In this way, the communication be- 
tween the client and the server can be customized, 
and optimized protocols can be adopted. 


This Jini mechanism for proxy distribution can be 
achieved in a CORBA environment by using the Au- 
tomatic Configuration Service in conjunction with 
a reflective ORB such as dynamicTAO [KRL*00]. 
The Automatic Configuration Service would fetch 
the proxy code and dynamically link it, while dy- 
namicTAO would use the TAO pluggable protocols 
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framework [OKS*00] to plug the proxy code into 
the TAO framework. 


Jini is normally limited to small-scale networks and 
it does not address the management of component- 
based applications and inter-component depen- 
dence. Due to the large memory requirements im- 
posed by Java/Jini, this is not yet a viable alterna- 
tive for most PDAs and embedded devices. 


The Globus project [FK98] provides a “computa- 
tional grid” [FK99a] integrating heterogeneous dis- 
tributed resources in a single wide-area system. It 
supports scalable resource management based on a 
hierarchy of resource managers similar to the ones 
we propose. Globus defines an extensible Resource 
Specification Language (RSL) that is similar to our 
SPDF (described in Section 3.4.1). RSL [Glo00] al- 
lows Globus users to specify the executables they 
want to run as well as their resource requirements 
and environment characteristics. RSL could be inte- 
grated in our system by plugging an RSLParser into 
our Automatic Configuration framework. A funda- 
mental difference between Globus and our work is 
that we focus on component-based applications that 
are dynamically configured by assembling compo- 
nents fetched from a network repository. In Globus, 
on the other hand, the user specifies the application 
to be executed by giving the name of a single exe- 
cutable on the target host file system or by giving a 
URL from which the executable can be fetched. 


Legion [GW*97] is the system that shares most sim- 
ilarities with 2K as it also builds on a distributed, 
reflective object model. However, the Legion re- 
searchers focused on developing a new object model 
from scratch. Legion applications must be built us- 
ing Legion-specific libraries, compiler, and run-time 
system (the Legion’s ORB). In contrast, we focused 
on leveraging CORBA technology to build an in- 
tegrated architecture that could provide the same 
functionality as Legion, while still preserving com- 
plete interoperability with other CORBA systems. 
In addition, our work emphasizes automatic config- 
uration and dependence management, which are not 
addressed by Legion. 


Systems based on architectural connectors like Uni- 
Con [SDZ96} and ArchStudio [OT98] and systems 
based on software buses like Polylith [Pur94] sep- 
arate issues concerning component functional be- 
havior from component interaction. Our model 
goes one step further by separating inter-component 
communication from inter-component dependence. 


Connectors and software buses require that applica- 
tions be programmed to a particular communication 
paradigm. Unlike previous work in this area, our 
model does not dictate a particular communication 
paradigm like connectors or buses. It can be used 
in conjunction with connectors, buses, local method 
invocation, CORBA, Java RMI, and other methods. 
As demonstrated by our experiments with dynamic- 
TAO [KRL*00], the model was applied to a legacy 
system without requiring any modification to its 
functional implementation or to its inter-component 
communication mechanisms. 


Communication and dependence are often inti- 
mately related. But, in many cases, the dis- 
tinction between inter-component dependence and 
inter-component communication is beneficial. For 
example, the quality of service provided by a multi- 
media application is greatly influenced by the mech- 
anisms utilized by underlying services such as vir- 
tual memory, scheduling, and memory allocation 
(e.g., through the new operator). The interaction 
between the application and these services is often 
implicit, i-e., no direct communication (e.g., library 
or system calls) takes place. Yet, if the system in- 
frastructure allows developers to establish and ma- 
nipulate dependence relationships between the ap- 
plication and these services, the application can be 
notified of substantial changes in the state and con- 
figuration of the services that may affect its perfor- 
mance. 


Research in software architecture [SG96] and dy- 
namic configuration [PCS98] typically focuses on 
the architecture of individual applications. It does 
not deal with dependencies of application compo- 
nents towards system components, other applica- 
tions, or services available in the distributed envi- 
ronment. Our approach differs from them in the 
sense that, for each component, we specify its de- 
pendencies on all the different kinds of environment 
components and we maintain and use these dynamic 
dependencies at runtime. Approaches based on soft- 
ware architecture typically rely on global, central- 
ized knowledge of application architecture. In con- 
trast, our method is more decentralized and focuses 
on more direct component dependencies. We believe 
that, rather than conflicting with the software archi- 
tecture approach, our vision complements them by 
reasoning about all the dependencies that may af- 
fect reliability, performance, and quality of service. 


The final solution to the problem of supporting reli- 
able automatic (re)configuration may reside on the 
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combination of our model with recent work in soft- 
ware architecture and dynamic (re)configuration. 
This is certainly an important open research prob- 
lem to be investigated in the future. 


7? Future Work 


Under the 2K project we have also been work- 
ing on QoS compilation techniques, addressing the 
problem of translating application-level QoS speci- 
fications to component-level QoS specifications, and 
then to resource-level QoS specifications [NWX00]. 
In the near future, our group will work on the im- 
plementation of the mechanisms for fault-tolerance 
and scalability described in Section 4.4. Security 
will be provided by a CORBA implementation of 
the standard Generic Security Services (GSS) API 
{Lin97]. 


In the previous sections, we alluded to some other 
important topics for future work, namely, (1) auto- 
matic creation and refinement of prerequisite spec- 
ifications, (2) intelligent algorithms for component 
caching taking versions into consideration, and (3) 
the integration of our dependence model with recent 
research in software architecture. 


8 Conclusions 


Component technologies will play a fundamental 
role in the next generation computer systems as the 
complexity of software and the diversity and per- 
vasiveness of computing devices increase. However, 
component technologies must offer mechanisms for 
automatic management of inter-component depen- 
dencies and component-to-resource dependencies. 
Otherwise, the development of component-based 
systems will continue to be difficult and frequently 
lead to unreliable and non-robust systems. 


Although there are still a number of open prob- 
lems for future research, we believe that this pa- 
per gives an important contribution to the area by 
presenting an object-oriented architecture for au- 
tomatic configuration and dynamic resource man- 
agement in distributed component systems. Perfor- 
mance evaluation demonstrated that our system is 
able to dynamically instantiate applications by as- 


sembling network components in less than a tenth 
of a second. 


Future work in our group will extend the Resource 
Management Service implementation to improve its 
fault-tolerance and scalability and enhance the syn- 
ergy between dynamic resource management and 
automatic configuration. 
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Abstract 


Workstations and PCs typically are rich in re- 
sources, in contrast to palmtop devices, which are 
generally quite limited. This disparity offers chal- 
lenges to integrating these heterogeneous devices 
into a single distributed system. Services must be 
available to each device, but it may be necessary to 
modify certain services if the connected device does 
not have the desired resources. 


A key component of many distributed systems is 
remote access to data. Traditional distributed file 
systems are typically rather static and are not able 
to adapt to the current available resources of the 
devices involved. Data files are treated as contin- 
uous streams of bytes and the interfaces to access 
them are designed for unstructured data; they sim- 
ply transfer buffers of contiguous data. Providing 
modality and adapting content using these inter- 
faces proves difficult. 


In this paper, we present an adaptive data object 
service for pervasive computing environments using 
distributed objects. Data is manipulated through 
an object-oriented interface based on containers and 
iterators. The interface is also used to model data 
operations, conversions, and proxies. The system is 
aware of its environment and can instantiate objects 
in the proper locations to optimize performance. 


*This research is supported by a grant from the National 
Science Foundation, NSF 98-70736. 


1 Introduction 


The recent popularity of personal digital assistants 
(PDAs) and Web-enabled cell phones has brought 
mobile handheld computing into the mainstream. 
Users are now able to perform many tasks that were 
once restricted to larger desktop systems. Although 
these devices will almost certainly always possess 
less computing power than their desktop counter- 
parts, they will eventually offer universal access to 
the network. One of the key challenges is the in- 
tegration of these handheld devices into larger dis- 
tributed systems. The handheld devices should be- 
come an extension of the system that they can inter- 
act with in the same way that a stationary machine 
can. 


The increasing diversity of devices accessing dis- 
tributed systems makes traditional data distribu- 
tion mechanisms inappropriate, since differing de- 
vice types may require the service to behave in dif- 
ferent ways. For example, when displaying video on 
a small device, it may be better to decode MPEG 
on a nearby host and send raw pixmaps to the hand- 
held used for output. Systems that are not able to 
adapt to the current environment are therefore not 
best suited for heterogeneous distributed systems. 


An active area of research involving highly het- 
erogeneous environments has been that of perva- 
sive computing {Wei93, Abo99, MIT, Hew, Mic]. 
These environments consist of intelligent rooms or 
areas, containing appliances (whiteboard, video pro- 
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jectors, etc), powerful stationary computers, and 
mobile wireless handheld devices. The large col- 
lection of devices, resources, and peripherals must 
be coordinated and access to them must be made 
simple. Such coordination may be viewed as be- 
ing analogous to the role of a traditional operating 
system. However, the heterogeneity, mobility, and 
sheer number of devices makes the system vastly 
more complex [RCO00]. Applications may have the 
choice of a number of input devices, such as mouse, 
pen, or finger; output devices, such as monitor, PDA 
screen, wall-mounted display, or speakers. An in- 
frastructure for such a space must be able to locate 
the most appropriate device, detect when new de- 
vices are spontaneously added to the system, and 
adapt content when data formats are not compat- 
ible with output devices. For example, if a user 
wishes to view an on-going presentation on a small 
handheld, images of the slides could be sent to the 
roaming user, but in a format more appropriate for 
the device, such as a scaled down image to fit the 
small screen size. Moreover, more extreme transfor- 
mations may be performed, such as converting text 
data to audio. Applications should not be both- 
ered with the complexities of such conversions; they 
should gain access to data in a particular format by 
simply opening the data source as the specific de- 
sired type. The system should automatically adapt 
content to the desired format and place the conver- 
sion modules in locations to maximize efficiency. 


To address the foregoing issues, we have built a 
general data distribution service targeted at het- 
erogeneous environments, that incorporates auto- 
matic content adaptation, location awareness, and 
knowledge of environment. The design of the ser- 
vice is based on the concept of containers and 
iterators exhibited in the Standard Template Li- 
brary (STL) [SL94, MS96]; containers provide data 
manipulation operations, parsing mechanisms, and 
content transformations for structured data and 
convenient access is provided via iterators. Con- 
tainers may be instantiated in the most appropriate 
locations, and access to these components may be 
transfered among nodes, enabling containers placed 
on various nodes to communicate. The application 
programming interface uses C++ templates and 
generic programming [Mus89] concepts to hide the 
communication infrastructure and maximize code 
reuse. In the current implementation, we have used 
CORBA as the underlying middleware layer. How- 
ever, we are not restricted to using CORBA and 
are planning on porting the system to a light-weight 
communication core. 


The remainder of this paper is presented as follows: 
section 2 gives an overview of our data service, in- 
cluding a brief description of the larger system the 
service is a part of. Section 3 describes the system 
layer of the service and the user layer containers and 
iterators, including examples. Section 4 presents 
our continuing work. Sections 5 and 6 present re- 
lated work and concluding remarks, respectively. 


2 The Data Object Service 


The Data Object Service (DOS) is the data deliv- 
ery mechanism for Gaia, an operating system for 
physical spaces we are currently developing. In the 
following sections, we describe Gaia and the design 
of the data service. 


2.1 Overview of Gaia 


Gaia is an infrastructure that exports and coordi- 
nates the rescurces contained in a physical space, 
thereby defining a generic computational environ- 
ment [Gai00]. Gaia converts physical spaces and 
the ubiquitous computing devices they contain into 
a programmable computing system. Gaia is anal- 
ogous to traditional computing systems; just as a 
computer is viewed as one object, composed of in- 
put /output devices, resources and peripherals, so is 
a physical space populated with many devices. An 
operating system for such a space must be able to 
coordinate the resources available in such a space. 
Gaia is similar to traditional operating systems by 
managing the tasks common to all applications built 
for physical spaces. 


Gaia provides some core services, including events, 
entity presence (devices, users, and services), dis- 
covery, naming, location, trading. Devices are able 
to detect when they have entered new spaces and 
can take advantage of the services available in the 
physical location. By specifying well-defined inter- 
faces to devices and services, applications may be 
built in a generic way that are able to run in ar- 
bitrary spaces. For example, a classroom applica- 
tion may be built that uses the physical devices in 
a room. When the user moves to a new classroom, 
the application can use the devices present in the 
new space. 
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Figure 1: Servers manage their local native files and devices. The system can instantiate containers on any 
node in the system and adapt content for different device types. Rounded rectangles represent container 


instances. Hexagons represent template wrappers. 


In addition, we are developing an application model 
for pervasive computing environments that is in- 
spired from the Model-View-Controller (MVC) ar- 
chitecture. The MVC components are no longer re- 
stricted to software entities, i.e., they may be physi- 
cal entities. For example, a house may be the model, 
providing data to various views of its sub-systems. 
Our model introduces an adaptor that can modify 
data from the model to a format the view desires. 


Gaia is an extension of our previous work on the 
2K operating system. 2K is a middleware operat- 
ing system using CORBA [The98] as the communi- 
cation mechanism and runs on top of existing plat- 
forms, such as Windows NT and Solaris. It uses a 
modified version of the TAO Object Request Bro- 
ker (ORB) [SC99], called dynamicTAO [KRL*00], 
that offers dynamic configuration of the ORB inter- 


nal engine in order to adapt to the dynamic needs 
of users. 


Our initial implementation of DOS uses CORBA 
to leverage some of the standard CORBA services. 
In Gaia, we are applying these services in the con- 
text of physical spaces. Two services that are used 
heavily are the Name Service and the Trading Ser- 
vice. The Name Service allows transparent access 
to particular object references for applications. The 
Trading Service allows applications to find objects 
that possess some specific constraints. We are us- 
‘ing these services in conjunction to provide loca- 
tion specific trading services. The Event Service is 
used with physical location detection systems (e.g., 
badges) to provide presence notifications of entities, 
such as users and devices. In addition, events are 
used to send “heartbeat” messages to the system to 
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determine entity liveness. 


Since Gaia targets pervasive computing environ- 
ments, many small devices interact with the sys- 
tem. In the future, we will use a small composable 
communication mechanism, called the Universal In- 
teroperable Core (UIC), that can communicate via 
different protocols (e.g., GIOP, SOAP) for mobile 
handheld devices that users may carry [UBI00]. The 
UIC can be composed dynamically, using only the 
required components. This allows the implementa- 
tion to be customized to small devices and allows 
these devices to interact with services using stan- 
dard protocols. In addition, devices can include 
server-side functionality, allowing them to accept 
events and method invocations. Since UIC is able 
to communicate with standard CORBA servers, we 
will be able to access the standard and custom ser- 
vices from these small handheld devices. 


2.2 Data Objects in Gaia 


Traditional distributed file systems ([How88, 
SGKt85, Wel92] are generally designed for homo- 
geneous environments and simply transfer data 
to the local node. However, the heterogeneous 
nature of pervasive computing environments deems 
the static configurations of traditional distributed 
file systems inappropriate, since some nodes (e.g., 
handhelds) may require additional support from the 
infrastructure. Fixed policies may preclude some 
nodes from participating in these environments. A 
data access service that is dynamically configurable 
offers modality for different device types. 


DOS is a middleware data service that makes use 
of the native operating system to manage data on 
disk. However, the service offers more than sim- 
ple access to file data. In general, data is no longer 
transported as streams of bytes (although this mode 
is supported), but as data objects. Traditional file 
system interfaces (i.e., open, read, write, close) are 
replaced with object-oriented abstractions: contain- 
ers and iterators [GHJV95, SL94]. These abstrac- 
tions are a more suitable interface for accessing data 
as objects, since iterators can return data of a cer- 
tain type and can be used to traverse the objects. 
Iterators provide the indirection needed to manip- 
ulate different containers using a single interface. 
In contrast to the standard read method for exam- 
ple, which passes a buffer to be filled in, an iterator 
returns references to objects whose size may be un- 
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known a priori to the user. 


In the most basic form, containers are simply wrap- 
pers for native file data or directories, but they 
can also be much more interesting and useful ob- 
jects. In general, containers may represent any col- 
lection of data, that may be generated on-the-fly, 
gathered from disparate sources, or common data 
shared among distributed applications. They may 
also be used to interface with devices (e.g., writing 
a postscript file to a printer). 


Containers are constructed as CORBA objects and 
applications can communicate with them through 
ORBs. CORBA provides infrastructure for trans- 
parent and platform-independent access to remote 
(or local) objects. Objects can be instantiated on 
any host and references to these objects can be 
passed around in a simple manner. This facilitates 
the creation of containers in various locations that 
may easily be connected together, as illustrated in 
Fig. 1. Dynamic placement of objects (and their 
functionality) is critical for heterogeneous environ- 
ments to support all device capabilities. Different 
containers hold different kinds of data and CORBA 
handles the job of marshaling/unmarshaling and 
transporting the data. DOS assumes some of the 
burden generally placed on the programmer by pars- 
ing native file contents into indexed components 
that applications can manipulate more easily. 


Some containers may be instantiated on proxy 
servers. These servers generally do not provide 
clients access to disk, but rather to their CPU and 
memory. For example, a proxy [Sha86] may be used 
to perform some expensive parsing or computation 
that should not be performed on the node maintain- 
ing the native files (as not to hinder other clients in- 
teracting with that server) or on the client (it is too 
weak to perform the parsing itself). The system can 
configure itself by placing container objects in the 
best location, based on knowledge of surrounding 
devices, to optimize performance. 


For example, when performing a grep on a collec- 
tion of files, simply copying data as-is to a small 
handheld device and searching locally may be in- 
appropriate due to the severe resource limitations. 
It may be better to find matches on the file server 
and then transfer the resulting text to the hand- 
held, as illustrated in Fig. 2. Clients can then use 
the search results to retrieve only those files that 
are of interest. However, the system should be con- 
figurable to direct where such operations should be 
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Figure 2: (a) Searching locally requires all data to 
be transfered to the local node. (b) Remote search- 
ing transfers only the results. 


carried out to provide optimal performance. Pow- 
erful desktops connected with high-speed networks 
should not burden the server with these operations; 
they should use their own resources to complete the 
task. Weak devices should instead use the server 
computing power. Operations on data can be seen 
as containers that wrap a primitive data format and 
re-export it either as a different format or as the re- 
sult of a transformation on the source. 


As another example, Fig. 1 shows (on the right) a 
container translating MPEG to bitmaps for stream- 
ing video to a Palm Pilot device. The application 
can simply retrieve objects from an MPEG con- 
tainer and direct them to a display device, unaware 
of the complexity of establishing proxies and trans- 
lating between data formats. 


3 Architectural Design 


The data service consists of two layers; a low-level 
system layer and a high-level user layer, as illus- 
trated in Fig. 3. The lower layer has access to 
CORBA object references and includes a compo- 








nent to organize the storage naming hierarchy. The 


upper layer provides a simple user interface. We 
now describe both layers in the following sections. 


Ls 


Figure 3: Layered structure of DOS. 
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3.1 User Interface Layer 


The top layer consists of user level containers and it- 
erators that hide the CORBA mechanics and reside 
in the local address space. Through a combination 
of wrapper classes and C++ templates, the user is 
presented with a clean and easy to use interface. 
Templates are used with generic programming con- 
cepts to provide a distributed generic programming 
model. 


3.1.1 Containers 


User level containers inherit from a template 
class that maintains a reference to the underlying 
CORBA container and provides methods for cre- 
ation and adaptation. Container subclasses provide 
specialized operation for a particular container type 
(e.g., setting the dimensions of a slide presentation) 
and hide the existence of the template, providing a 
clean interface for application developers. If no spe- 
cial methods are necessary for a specific container 
type, the subclass is simply a wrapper and merely 
specifies the template parameter list. As shown in 
Fig. 4, the parameter list consists of the container 
type (C), buffer type (B), and object type (OQ), 
which are the CORBA container, transport buffer 
(a sequence of objects), and the indexed component 
object types, respectively. In addition, the subclass 
specifies the type of iterator to use. In this way, the 
application developer never sees the existence of a 
template, merely a particular container type (exam- 
ples are given in section 3.2). The template glues 
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together the correct combination of components for 
the container to work correctly. Containers gener- 
ally do not need to add specialized code, so creating 
a container wrapper (only specifying the parameter 
types) can be done in one line of code. 


Templates are used to provide compile-time poly- 
morphism of CORBA container types, thereby 
applying generic programming techniques to dis- 
tributed objects. Different CORBA containers pro- 
vide methods to get and put objects of a particu- 
lar type. However, the name of the methods must 
adhere to a convention (getObjects()/put Objects()) 
for each container. The particular object types to 
be transfered are specified in the template parame- 
ter list. Therefore, the template container transfers 
data of a certain type when communicating with the 
remote ORB. 


In effect, the user level container provides a consis- 
tent view of a CORBA container, although objects 
of different types are specified in the IDL container 
descriptions and are marshaled over the network. 
The need to use the CORBA type Any to transfer 
ob jects is removed and eliminates the need to type- 
cast objects to a specific type. 


3.1.2 Iterators 


Iterators provide a simple interface for users to tra- 
verse the structure of the data inside a container. 
They maintain the current position and cache infor- 
mation about their respective containers. Caching 
specific information about the container locally re- 
duces the need to access the remote object as often, 
therefore, reducing network access and latency. 


Different containers require different access meth- 
ods and are associated with a specific iterator type. 
Since containers create iterator instances, the user 
is forced to use the correct iterator. The syntax for 
obtaining an iterator is identical to the STL and 
examples of its use are given below. 


There are two types of iterators: Objectiterators and 
Streamliterators. Objectlterators treat the contents 
of a container as objects, in contrast to Streamiter- 
ators, that view the contents as a stream of octets. 
The latter are required to provide the traditional 
view of files, as streams of bytes, efficiently. The 
implementations of iterator types differ in how they 
detect when the iterator is at the end of a container. 


Subclasses provide specific methods for traversal. 
For example, RandomObjectiterator allows random 
placement of the iterator in the container. 


Iterators are useful for retrieving remote objects in- 
crementally [HV99] and containers hide caching of 
object groups. Although the container is a collec- 
tion of items, the items need not all be loaded into 
local memory at the same time, as shown in Fig. 5. 
For example, when a user iterates over a collection 
of ob jects, they do not have to be individually pulled 
over from a remote server. Some number of objects 
may be prefetched and cached. The local template 
container plays the role of a buffer cache in stan- 
dard file systems [MJ86]. If an object is requested, 
but is already available, it can be retrieved out of 
the cache. However, if the object is not available, 
the next group of objects may be retrieved to local 
memory and the current object passed to the user. 
The iterator hides this caching mechanism from the 
user; ob jects are handled as if they were all local. 


The above described caching mechanism is used 
if data content is parsed remotely or on a proxy. 
However, if a container is resident locally, all data 
is transported to the local node and parsed there. 
Therefore, nodes with enough resources can cache 
the entire contents of a data source. 


3.2 Interface Usage 


The following examples illustrate the user interface. 
The first example opens a container as a stream 
of bytes in read-only mode. The container is then 
adapted to look like a container of text line objects. 
An iterator is then created and each line is printed 
to the console. Exception handling is removed for 
clarity. 


ByteContainer b("MyFile", FS::Read); 

LineContainer 1 = b.as("LineContainer"); 

LineContainer::iterator i; 

for (i = 1.begin(); i != l.end(); i++) 
cout << #i << endl; 


It may be noted that although the containers are ac- 
tually different template types, assignment is han- 
dled correctly. The as() method instantiates a 
CORBA LineContainer adaptor container on the lo- 
cal node (by default). However, the user or system 
may specify that it be instantiated remotely. 
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Figure 4: UML class diagram of relationship between LineContainer and RandomObjectiterator objects. 
Containers may retrieve groups of data objects and cache them (not shown) to reduce the number of 
network requests. C is Container Type; B is Buffer Type; O is Object Type. 


Typically, the system uses the as() method to pro- 
vide implicit adaptor instantiations; a container 
may be opened as a particular type directly, rather 
than having to first open it as a native container 
type and then specifying the adaptor. Therefore, 
applications can open containers in the format that 
they require and any adaptation/conversion is done 
automatically. This method is shown in the remain- 
ing examples. 


The next example illustrates how a weak device may 
view a video sequence over a very slow network con- 
nections. Due to the limited resources available on 
such a device, it may be incapable of decoding and 
displaying MPEG video. However, the sequence 
may be transformed to bitmap images using a con- 
verter container and then pulled by the handheld 
device [HRCMO00]. The container may need to han- 
dle the real-time nature of particular data sources, 
for example, by dropping frames if the client cannot 
keep up with the data source. It is the responsibil- 
ity of the service to install the correct converter in 
the proper location, transparent to the application 
programmer. 


BitmapViewer viewer; 

BitmapContainer b("MyMPEG") ; 

BitmapContainer::iterator i; 

for (i = b.begin(); i != b.end(); i++) 
viewer .display (*i); 


It may be desirable to “display” data in a format 
different from the source format when it is more 
convenient for the user. For instance, when using 
a computer with a small screen (e.g., a cellphone), 
retrieved messages may be more easily heard than 
read. A converter could be instantiated to present 
the data in the desired format. 


AudioDevice device; 

AudioContainer a("MyMailbox") ; 

AudioContainer::iterator i = a.begin(); 
. get user input for message number .. 

i += nun; 

device << ¥*i; 


The next example illustrates how a Palm Pilot can 
view a Microsoft PowerPoint presentation. The 





USENIX Association 


6th USENIX Conference on Object-Oriented Technologies and Systems 37 





getObjects() 


Unmarshal 


Concrete Containers 








Marshal 


\ TS. fo OX = 


Local Node 


Network 


Remote Node 


Figure 5: User-level containers are described using generic programming concepts to maximize core reuse. 
Typed objects are marshaled over the network. Groups of data objects can be cached in the local (generic) 


container. 


system opens the presentation file with a Power- 
PointContainer (using OLE), which contains data 
objects (slides) in GIF format. It then con- 
verts the GIF slides to bitmap images using a 
G/IF2BitmapContainer. The interface that the ap- 
plication manipulates is a BitmapContainer, which 
the G/F2BitmapContainer implements. 


BitmapViewer viewer; 
RitmapContainer p("MyPresentation.ppt") ; 
BitmapContainer::iterator i; 
for (i = p.begin(); i != p.end(); i++) 
. get user input for next slide ... 
viewer.display (*i) ; 


The previous examples implement different iterator 
types, but are used in a similar manner. The com- 
plexity of specific container and iterator creation are 
transparent to the user. Also notice that the con- 
tainers create iterators to handle the specific data 
object types it holds. 


3.3. System Layer 


The system layer provides access to servers and ser- 
vants via CORBA object references. A local com- 
ponent caches object references and provides name 
resolution support. Several types of system contain- 
ers exist, which are hidden by the user level contain- 
ers discussed above. The following sections describe 
the different types of containers available. In ad- 
dition, the mechanisms that exist for locating data 
and creating components are discussed. 
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3.3.1 Containers 


Containers are the main abstraction for represent- 
ing data and provide methods for creation and dele- 
tion of the data objects they hold. Concrete con- 
tainers are implemented using the Gaia component 
model. Each container is built as a dynamic link li- 
brary (on Windows) or ashared object (on Solaris). 
The component model allows the service to load, 
create, and activate container components. Decou- 
pling the containers from the service allows new 
container types to be added to the running system 
without interrupting current applications. There 
are several different container types that perform 
different roles in the system. 


File Containers File containers enable access to 
native operating system files and directories. File 
containers parse data of different file types into in- 
dexed components (e.g., DirectoryContainer, Mail- 
Container, etc). Parsing meta-data can be cached 
persistently for future container accesses, therefore 
eliminating the need to determine ob ject boundaries 
each time a container is opened. This is particularly 
useful for containers that do not change frequently. 
Altering the contents of a container invalidates the 
cache. 


There are several strategies that may be used when 
implementing a file container. Access mode and file 
size affect which strategy is employed. For example, 
if a container is accessed as read-only, the bytes on 
disk will not change. If parsing meta-data is avail- 
able, access to indexed components only requires 
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the server to seek to a component boundary (which 
is included in the parse meta-data), reading in the 
appropriate amount of data, and sending it to the 
client.!_ However, if the container is accessed read- 
write, the byte layout on disk will probably change. 
Information regarding the insertion and deletion of 
objects in the container are cached in memory and 
then committed once the container is released. Al- 
ternately, the entire container can be loaded into 
memory (i.e., as an STL container) and insertion 
and deletion of objects are performed in memory. 
When releasing the container, the entire contents of 
the in-memory container is written to disk. This 
strategy is implemented more easily, but requires 
more memory. An area of future work is determin- 
ing when to use a particular strategy depending on 
access mode and file size. For example, large files 
should probably not be completely loaded into mem- 
ory. 


Some files do not contain any well-defined structure. 
Such files may be represented as a stream of bytes 
(ByteContainer), thereby supporting traditional file 
semantics. Finally, ByteContainers can be used by 
applications that want to bypass the type system of 
DOS, be it for backward compatibility or due to the 
lack of an appropriate container type. 


Processor Containers Containers can represent 
things other than standard files and directories. 
Processor containers act as “files” with dynamic 
content; the “file” is created on-the-fly. For exam- 
ple, a GrepContainer may provide the ability to per- 
form remote grep processing on files in a directory. 
This allows the computation of pattern matching to 
be performed at a remote location and the results 
transfered to the client. This not only reduces com- 
putational overhead on a weak client, but network 
traffic as well. 


Such remote processing may be performed on the 
server or at a proxy node. Too much processing 
on a server may slow down data access by other 
clients [SHG98]. If performed at a proxy, the server 
managing the native files acts as a traditional file 
server (just serving byte streams). Resource con- 
sumption is therefore split between two machines; 
the proxy server is used for memory and CPU, while 
the file server is used for disk access. This is man- 
aged quite easily through CORBA, since placing a 
component on a particular node is only a matter of 


1More than one component may be sent in one request. 


directing a particular node to instantiate the object. 


Converter Containers Weak devices may not 
be able to render data in its original format or pro- 
cess containers may require data to be in a partic- 
ular formats [Wir]. Conversion of content is per- 
formed via a converter container, which is used to 
transcode data to a new format. Converter con- 
tainers may be created on demand or automatically, 
when it is determined that the original data format 
is inappropriate, to provide on-the-fly transcodings. 


Complex conversion may require the support of sev- 
eral converter containers; therefore, converters can 
be linked together. Converters can be created on 
different hosts, such as the local machine, the ma- 
chine maintaining the native data, or any other ma- 
chine. Creating an converter inserts a component 
into the flow of data and changes the container in- 
terface, similar to a module in a stream [Rit84]. 


For example, if a converter exists that transforms 
Microsoft Word documents into ASCII text format, 
a grep could be performed on Word files. Since grep 
requires ASCII text as input, the Word file would 
be opened as an ASC//Container, which the system 
would transparently convert in order to present the 
file in the format that grep expects. 


3.3.2 System Core 


The system core consists of a component that main- 
tains a cache of references to machines exporting 
storage and provides name resolution facilities. The 
core includes a prefix table mechanism? [WO86] 
and, when needed, attempts to make connections to 
available remote data servers or proxy servers listed 
in the prefix table. Each remote data server man- 
ages the data content on their respective machines 
and is responsible for creating CORBA container 
objects on that host. A server may be started on 
the local node (if resources are available) so that 
the local disk may be accessed (if available) and 
containers can be created locally.2 The interface to 
access local and remote objects is identical, so con- 
tacting any server is merely a matter of getting an 


?Names are paths. Path prefixes, or “mount points”, are 
translated to object references. 

3Mobile handheld devices would probably not launch the 
local server and would rely on remote servers to instantiate 
all containers. 
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object reference to the correct server. This man- 
agement component is a C++ class rather than a 
CORBA object, since it does not need to be ac- 
cessed remotely. 


The local view of the storage layout (namespace) 
is constructed through the use of the prefix tables. 
The prefix tables are used for name resolution and 
to locate storage. When a new file, directory, or 
device is accessed, the local container name in the 
hierarchy is translated to the native name and the 
manager finds the correct server hosting the con- 
tent. Requests are then directed towards manager 
components (see section 3.3.4), which are responsi- 
ble for the creation/destruction of various types of 
containers. 


3.3.3 Layout Manager 


The Layout Manager stores the prefix tables that 
allows machines and devices to export all or a por- 
tion of their storage. This manager is implemented 
as a service and may provide private local storage 
for a group or a physical space. For example, there 
may be a manager running in each space. When a 
user with a device enters a space, the device may 
obtain the storage descriptions of the space to build 
the local storage namespace. Another, more inter- 
esting, possibility is that the mobile device exports 
some of its storage. Consider aroom that contains a 
projector and presentation software. The mobile de- 
vice of the user may contain the actual presentation. 
When the user enters the room, the device contacts 
the Layout Manager and informs it of which part of 
its storage it wishes to export and the room then 
adds this storage to its namespace. The user may 
then navigate with the presentation software, which 
resides in the room, to the directory containing the 
presentation of the user, residing on the mobile. In 
such a scenario, there is no need to manually trans- 
fer files; the space automatically detects the exis- 
tence of a new storage device and incorporates it. 
Hence, the namespace (i.e., what storage the room 
is aware of) can change dynamically as new ma- 
chines and devices enter and leave physical spaces. 


3.3.4 Container Manager 


Access to each data source is initiated via a Con- 
tainer Manager. These managers act as factories 
for container creation and are the main entry point 
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to gaining access to object references. Once a man- 
ager has successfully created an association between 
a container and a native file, processor, or converter 
container, a reference to the container is returned. 


Container Managers also assist in data content 
adaptation/conversion, as described above, by find- 
ing an appropriate converter and returning a new 
interface.* Conversion may be done automatically 
by the manager when a request to open a container 
type does not match the underlying data source 
type. It may also be performed after a container 
has already been opened. This procedure is illus- 
trated in Fig. 6. In order to adapt a container in- 
terface, a container object reference is transfered 
to the manager performing the adaption via the 
adaptInterface() method (Fig. 6-a). The manager 
determines the type of the container and examines 
a graph to see if a possible converter exists between 
the container type that was passed and the desired 
target container type. If a suitable converter con- 
tainer is found, a concrete instance is created on 
that node (each container provides a static create() 
method to generate an instance of itself on behalf of 
the Container Manager factory), and the new con- 
verter container is given the original object refer- 
ence. The converter uses this object reference as its 
data source and knows the format of data that the 
source provides. Therefore, the converter receives 
objects of one type and sends objects of another 
type. The object reference of the newly created con- 
verter is then returned to the client, which can use 
it to get and put objects of the new type the adap- 
tor supports (Fig. 6-b). Hence, containers can be 
linked together and the data can change as if flows 
through the links. Since these converters can be 
placed on various nodes, they may act as proxies 
for weak clients. 


3.3.5 Container Descriptions 


In order for containers to be linked together to pro- 
vide the proper conversion, a description of the con- 
tainers must be available. We describe containers 
using XML. Each description specifies the name of 
the container component (i.e., the name of the li- 
brary that must be loaded that contains the com- 
ponent), type of the container (file, processor, or 
converter), input data object type, output data ob- 
ject type, and an optional file type (expressed as 


4The new interface has the same method names, but han- 
dles different data object types. 
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Figure 6: Container Managers enable adaptation of container interfaces. 


a file extension) that the container is associated 
with. When a Container Manager first starts up 
(or when a new container type is added to the sys- 
tem), it reads the XML descriptions and creates a 
graph based on the input/output types. This graph 
is used to determine which containers need to be in- 
stantiated and in what order to perform a particular 
conversion. 


4 Continuing Work 


Our current implementation is based on the 
CORBA standard. We will port our system to the 
UIC composable communication core to provide a 
light-weight implementation that can be used on 
small devices, such as Palm Pilot. In addition, the 
UIC can be composed to provide server-side func- 
tionality. We will develop a small server-side im- 
plementation that will allow a mobile device to au- 
tomatically be added to the storage namespace of 
a physical space once it is detected by the space. 
This will allow scenarios, such as the one described 
in section 3.3.3, to be realized. 


Another issue of future work is deciding the best lo- 
cation to instantiate containers (i.e., where to place 
proxy containers). We will use the 2K resource 
manager [Yam00] for load balancing and to deter- 
mine if a costly operation should be performed on a 
proxy node. Our service contains the mechanisms to 
place containers on various nodes, but this decision 
engine must be added. 


Currently, client applications must pull objects. 
However, there are many situations where data 
should be pushed out to a client. For example, if 


a group of users are engaged in a discussion us- 
ing a whiteboard, remote users may wish to see the 
schematics on the board. These updates to the con- 
tents of the board should be pushed out to remote 
users so that they can view new drawings. We will 
be adding push technology to our system to facil- 
itate such scenarios by registering callbacks with 
containers. Real-time data may be streamed us- 
ing RTP packets. We are building mechanisms to 
connect containers via streams, that will treat RTP 
packets as our data objects. 


5 Related Work 


Our work has a resemblance to file systems in 
some respects. Some previous systems have treated 
data as groups of data, rather that contiguous 
bytes of of unstructured storage. Semantic file sys- 
tems [GJSJ91] index data when files and directories 
are created and updated. They allow extraction of 
attributes using file-type transducers. Such a sys- 
tem provides the user with alternate views of data 
and a query mechanism for finding information. The 
Choices file system [Mad92] defines a framework for 
building different file system types. Data on sec- 
ondary storage is represented as containers and is 
parsed and indexed depending on file type. In ad- 
dition, container contents can be viewed in different 
ways. However, the system is not distributed and 
does not perform adaptation and conversions. A re- 
placement for standard file system organization has 
been proposed that logically treats files as nested 
boxes [BA99]. Remote copy operations and con- 
verters are incorporated into the design. 


The effects of mobile code were evaluated on a dis- 
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tributed file service [SHG98]. The cost of perform- 
ing remote file operations versus increase in server 
load was measured. It was found that moving op- 
erations to the file server (i.e., agrep) is typically 
advantageous when client CPU power is below that 
of the server and network latencies are high. Hov,- 
ever, excessive computational load on the server can 
reduce throughput for clients simply requesting byte 
streams. Our service can be configured to perform 
such computations on a remote node when appro- 
priate. 


Our API borrows concepts from the Standard Tem- 
plate Library (STL) [SL94, Gla97], which defines 
objects for organizing collections of data. It also 
defines generic iterator objects, similar to the C++ 
stream interface [Str98], to access the data of un- 
derlying data collections. Iterators form an abstract 
interface to a number of different collection types. 
The collections are typically located in the local ad- 
dress space, requiring the local node to parse data 
into components for insertion into the collection. 
The Java stream package, (java.io) [Sun], defines 
basic streams that may be adapted to add specific 
functionality. However, such adaptors may only be 
applied locally. 


Several pervasive computing projects have investi- 
gated the problem of information access and shar- 
ing in heterogeneous environments. IBM’s TSpaces 
enhances the concept of a Tuplespace by adding 
consideration for heterogeneity of devices, scalabil- 
ity, and persistence [WMLF98]. TSpaces allow dis- 
tributed applications to share information in a de- 
coupled manner and allows a high degree of interop- 
erability, via tuples. Their implementation includes 
support for access control, event notification, and 
efficient retrieval of information. In addition, new 
operators may be dynamically added to the server, 
which may be used immediately. This is similar to 
our design of allowing new container types to be 
spontaneously added to the system. The TSpaces 
project resembles a database system, where our sys- 
tem is more focused on adaptation of content. How- 
ever, we could create a container type that was 
specifically tailored for tuples, which could be used 
as a shared data among applications. 


The Infospheres project at Caltech is constructing 
an infrastructure for organizing task forces [Cha96]. 
Their goal is to build a system that allows highly 
dynamic groups to be rapidly assembled and share 
information. Other concerns are how to scale to 
billions of objects, restricting access to objects to 


authorized personnel, dealing with message delays 
over networks that may scale globally, and manag- 
ing resources by “freezing” and “thawing” objects 
when needed. This research is more focused on or- 
ganizing dynamic groups of people. 


Jini technology enables heterogeneous devices 
equipped with a Java virtual machine to discover 
services in physical spaces [Wal]. Devices may reg- 
ister themselves with the Jini lookup service. Once 
registered, other devices may discover them and im- 
mediately use their services. Using the code mobil- 
ity of Java, custom user interfaces or application 
may be sent to client devices to allow interaction 
with services or resources. Jini technology main fo- 
cus is to allow devices to discover and interact with 
each other. Our system is more concerned with the 
delivery of data and adaptive content, rather than 
particular services. 


The Document Object Model (DOM) is an object- 
oriented model to represent documents as a tree of 
nodes [CBNW]. Interfaces are available to traverse 
and manipulate the tree to gain access to structured 
data. The DOM interfaces are typically used as 
a result of parsing XML documents. Such docu- 
ments can encompass an array of object types. We 
were more concerned with groups of similar objects, 
which simplifies our user interface. We could, how- 
ever, create a container that resembles the function- 
ality of DOM. 


The work most similar to ours is that of the Ninja 
project from UC Berkeley [GWvB*00]. The Ninja 
architecture defines four main components: bases, 
units, active proxies, and paths. Bases are mani- 
fested as a cluster of workstations that provide scal- 
ability, fault tolerance, and concurrency. Units com- 
prise the myriad of devices that may be connected 
to the infrastructure. The active proxies provide 
adaptation of content (similar to our containers), 
and are the result of previous research in data dis- 
tillation using the TACC [Fox] model to perform on- 
the-fly data transformations [FGBA96]. Transcod- 
ing data formats was found to greatly increase the 
performance of certain applications [FGG* 98]. The 
last component, paths, constructs flows of data that 
may be transformed while passing through different 
components, using their active proxies. These are 
similar to our channels. Our methodology is slightly 
different, in that we have leveraged the features of 
CORBA and its services and approach the problem 
from an operating system point of view, where the 
Ninja project takes a Java-centric Internet-service 
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approach. We have also focused on the user inter- 
face, to allow simple data access and ease applica- 
tion development. 


6 Conclusions 


DOS provides the data transfer mechanism in Gaia, 
an operating system for physical spaces. DOS is 
able to alter behavior based on knowledge of com- 
puting device characteristics and location. Data is 
represented by containers and access is gained using 
iterators. Modeling files and directories as contain- 
ers unifies the interface for data distribution in our 
system; the interface is also used to model data op- 
erations, conversions, and proxies. Containers may 
be connected together as modules that can act on 
data passing through them. The system is aware 
of its environment and has the mechanisms to in- 
stantiate objects in the proper locations to optimize 
performance and provide load balancing. 


We have hidden the details of CORBA from the 
developer by using C++ templates and wrapper 
classes and have applied generic programming con- 
cepts to distributed objects. Objects of a particular 
type are marshaled over the net work and typecast- 
ing becomes unnecessary. User-level containers and 
iterators are defined as template classes that com- 
bine the proper components together to allow type- 
safe remote access to objects. Templates have made 
the construction of user-level containers trivial. 


DOS is a dynamic flexible system, in contrast to 
typical distributed file systems, that are designed for 
a particular operating environment. The dynamic 
nature of our data service makes it well-suited for 
heterogeneous environments, prevalent in pervasive 
computing. 
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8 Availability 


Resources for DOS and Gaia are available at: 


http://choices.cs.uiuc.edu/gaia 
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Abstract 


As Java becomes a viable platform for server applications, performance becomes a greater concern. An important 
aspect of Java Virtual Machine performance is its dynamic memory management system (garbage collection or 
GC). Traditional GC benchmarking often focuses on a set of fixed applications. As a result, when an actual appli- 
cation’s memory behavior differs from that of the standard benchmarks, the benchmark results do not help the 
user judge which GC implementation suits her application the best. In this paper, we present HBench:JGC, an 
application-specific benchmarking suite, based on the idea that a system’s performance be measured in the context 
of a specific application. HBench:JGC employs a methodology that characterizes the application memory usage 
and the GC implementation independently and carefully combines both characterizations to form a single metric 
that reflects a particular application’s performance in the presence of a particular GC implementation. We evalu- 
ate our approach on Sun Microsystems’s JDK1.2.2 classic JVM with a sequential mark-sweep GC. Our results 
demonstrate HBench:JGC’s unique predictive power and its ability to provide meaningful metrics that lead to a 


better understanding of GC performance. 


1. Introduction 


In recent years, there has been a rapid increase in the 
adoption of Java technology in a variety of environ- 
ments, ranging from JVMs embedded in web-browsers 
to high-performance server products. As Java becomes 
a viable platform for server applications, performance 
becomes a greater concern. An important piece of Java 
Virtual Machine performance is its dynamic memory 
management system (garbage collection or GC). His- 
toric data show that it is quite common for garbage 
collection to account for 20% or more of an applica- 
tion’s total running time [9]. Sometimes garbage col- 
lection is the performance bottleneck. Understanding 
GC performance and selecting the right GC implemen- 
tation, therefore, can lead to significant savings in the 
total running time of the application. 


The traditional GC benchmarking approach is to pick a 
set of programs, run them with different GC algo- 
rithms, and compare the total elapsed times. This ap- 
proach has been used by Smith and Dorisett, as well as 
by Zorn [12][15]. This approach is inadequate, since 
the optimal GC algorithm varies with the application 
[15], and the set of benchmark programs may not rep- 
resent the actual memory behavior of the application 
of interest. 


Another approach to benchmarking and selecting GC 
algorithms for a given application is to manually con- 
struct a small program that models the memory behav- 
ior of the application in question and when run, pro- 
duces the same memory footprint. This approach re- 
quires a high level of skill and is error-prone, 
especially when the application’s memory behavior is 
complicated. 


HBench:JGC is a benchmark suite that allows one to 
measure GC performance in the context of the applica- 
tions in which users are interested without having to 
model the applications manually. The underlying prin- 
ciple is to separate the characterization of application 
memory usage from that of the GC implementation. 
HBench:JGC includes a GC-independent profiler that 
traces an application’s memory behavior. It uses a set 
of microbenchmarks to measure the performance of a 
GC implementation in an application-independent 
way. The two characterizations are then fed to an ana- 
lyzer, which calculates the predicted GC time. Fig- 
ure 1 depicts the schema of HBench:JGC. 
HBench:JGC has the added advantage that one can use 
it to predict an application’s garbage collector per- 
formance on a target GC implementation without ac- 
tually running the application with the particular col- 
lector, as long as its performance characteristics are 
available. The GC-independence of the application 
characterization facilitates this unique flexibility. We 
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Figure 1. Schematic View of HBench:JGC Process 


can predict the application’s performance on different 
GC implementations by feeding performance charac- 
teristics of different GC implementations to the ana- 
lyzer. 


Section 2 describes the design of HBench:JGC in de- 
tail. Section 3 describes its prototype implementation. 
Section 4 presents experimental results on applying 
HBench:JGC to evaluating GC performance. Section 5 
discusses open issues and future work. Section 6 de- 
scribes related work and Section 7 concludes. 


2. HBench:JGC Design 


HBench:JGC is part of HBench, an application- 
specific benchmarking framework designed to address 
the problem that standard benchmark results do not 
reflect a particular application’s performance on a par- 
ticular system [11]. HBench:JGC is based on 
HBench’s vector-based methodology. The principle 
behind the vector-based methodology is that a sys- 
tem’s performance is determined by the performance 
of the individual primitive operations that it supports 
and that an application’s performance is determined by 
how much it utilizes the primitive operations of the 
underlying system. The running time of a given appli- 
cation can be estimated by carefully combining the 
two characterizations. A simple form of this combina- 
tion process would be to add up the costs of all primi- 
tive operations executed by the application. By sepa- 
rating characterizations of the application from that of 
the underlying system and by incorporating application 
characteristics into the benchmarking process, HBench 
can provide performance metrics that reflect the ex- 
pected behavior of a particular application on a par- 
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ticular platform, as well as allow meaningful compari- 
sons between different platforms. 


Although originally designed as part of the 
HBench:Java benchmark suite [14], the methodology 
of HBench:JGC described in this paper is applicable to 
GC implementations for other languages such as Lisp, 
Scheme, Smalltalk, and C++. 


2.1. GC Characterization 
2.1.1. Basic GC Concepts 


Like all memory management systems, a garbage col- 
lector implementation supports two primitive opera- 
tions, namely, object allocation and reclamation. 


The garbage collector manages the collection of free 
space from which new objects are allocated. The free 
space can be represented as a list of free blocks, a sin- 
gle chunk of contiguous space, or a combination of the 
two. 


When the allocator fails to satisfy an allocation re- 
quest, it initiates a garbage collection run. A garbage 
collection run typically starts with a marking phase, 
when live objects are identified and marked. This 
phase may be followed by one or more phases (typi- 
cally called the sweep phases) that free the space oc- 
cupied by the dead objects, making it available for 
allocation. A non-copying collector does not move the 
live objects, whereas a copying collector typically 
compacts the live objects to one end of the heap in 
order to create a large contiguous free space at the 
other end of the heap. Examples of non-copying col- 
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lectors include the most widely adopted mark-sweep 
garbage collector [1] and its variants. Examples of 
copying collectors include the Lisp 2 collector [8], 
which is a mark-compact collector, and Cheney's two- 
space copying collector [3]. For a complete treatment 
of this topic, readers are encouraged to refer to the 
book by Jones et al. [7]. 


2.1.2. A GC Implementation Taxonomy 


Independent of the GC algorithms (e.g., copying vs. 
non-copying), we can classify GC implementations 
according to the four attributes described in Table 1. 
The first attribute represents the axis between stopping 
all execution for garbage collection and running the 
collector completely in parallel with program execu- 
tion [2]. The second attribute describes the internal 
architecture of the collector itself, whether it is se- 
quential (single-threaded) or parallel (multi-threaded). 
The third attribute describes the granularity of collec- 
tion, whether collection occurs in a single, complete 
pass (batch-oriented) or whether just some of the 
available memory is reclaimed during each iteration 
(incremental). The fourth and last attribute distin- 
guishes generational garbage collectors [10] from non- 
generational collectors. Generational collectors im- 
plement a set of heaps that are cleaned with varying 
frequency depending on the age of the objects stored 
in the heap. Each heap corresponds to a different age 


group. 


Attributes of GC Implementations 
Stop-the-world < Concurrent 
Sequential © Parallel 


Batch © Incremental 





Non-generational < Generational 


Table 1. GC Implementation Techniques 


The four attributes in the taxonomy are largely or- 
thogonal, with a few exceptions. For example, a GC 
algorithm can be both stop-the-world and parallel, but 
it cannot be both concurrent and batch mode. 


In this paper we consider only sequential, stop-the- 
world, batch-mode and non-generational garbage col- 
lectors. We chose to start with this type of collector 
because it involves the fewest variables and thus al- 
lows faster prototyping of the analytical models and 
more controllable experimentation. Furthermore, this 


type of collector is still in wide use. For example, 
Sun's standard JDK1.1 and JDK1.2 Java Virtual Ma- 
chines use this type of collector. Section 5 discusses 
how we envision enhancing our approach to cope with 
concurrent, parallel, incremental and generational gar- 
bage collectors. 


2.1.3. Object Allocation 


For a given memory management algorithm, the cost 
of object allocation is typically determined by the fol- 
lowing two factors: 


1. the size of the allocation, 


2. the state of the heap, such as the number of 
free blocks and their sizes. 


We can represent this cost with a _ function 
C,,,.(heap_state, allocation_size). Depending on the 
memory management algorithm, C,,, carries different 
forms. In the case of copying garbage collectors, the 
free space is a contiguous area, and allocation can be 
implemented by a simple pointer advancement. 
Therefore, in the case of a copying collector, C,,,. is a 
constant function. In the case of non-copying 
collectors, such as a non-copying mark and sweep 
collector, the allocation time depends on the state of 
the free-block lists maintained by the collector. If we 
characterize the heap state with simple statistical 
measures, such as a normal distribution with a given 
mean and standard deviation, or a uniform distribution 
with a given range, we can represent C,,, in a concise 
way. Furthermore, we can measure C,, using 
microbenchmarks that initialize the heap according to 
the statistical measures. 


2.1.4 Object Reclamation 


An interesting aspect of garbage collection perform- 
ance is that the cost of dead object reclamation de- 
pends on the amount of live data on the heap, since the 
way a garbage collector identifies live objects is to 
traverse the connected object graph from a set of root 
objects. 


We divide the cost of object reclamation into three 
parts: the fixed cost (C,,,,), the per-live-object cost 
(C,,-), and the per-dead-object cost (C,.,). C,,,, corre- 
sponds to the fixed cost associated with a garbage col- 
lection run, such as the initialization of data structures. 
Cys HOrmally depends only on the heap size. C,,, is the 


fixe 


overhead measured per live object (objects that survive 


ae 
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the collection). For non-copying collectors, C,,, is 
typically constant. For copying collectors, C,,, is a 
function of the size of live objects, as live objects are 
compacted (copied) at the end of a collection run. C,., 
corresponds to the per-object cost of releasing the 
space of a dead object. In most cases, this involves 
updating bookkeeping information for the freed object, 
and thus C,,, is usually constant for a given collector 
algorithm. In summary, the cost of object reclamation 
can be represented by three functions, C,,.,(heap_size), 
C,,(object_size), and C,,,. Let N, be the distribution 
function of the sizes of live objects, i.e. N{s) is the 
number of surviving objects with size s. Let N, be the 
distribution function of dead object sizes. The total 
cost of garbage collecting a heap of size & can then be 
calculated using the following formula (1): 


Tec =C fixeq (h) +} Crive(s) *Ni(s) +C dead Ya (s) 
Ss 5 


The above reasoning makes the simplifying assump- 
tion that every live object is traversed exactly once 
during marking. For cases where an object is refer- 
enced by several live objects, the object will be visited 
multiple times by the collector. We characterize this 
additional cost by adding a second variable, d,, the fan- 
in degree of an object, in the per-live-object overhead 
function C,,.- The middle term of the formula thus be- 
comes: 


Y VCtive(s.di )- Ni (s,d;) 
sd; 


The situation is further complicated by the fact that 
certain copying collectors need to update an object’s 
references, if the objects it points to are copied to a 
different place. We characterize this additional cost by 
adding yet another variable, d,, the fan-out degree of 
an object, in the per-live-object overhead function C,,,. 
The middle term now becomes: 


YY DY Civelsdido): Ni (S.di,do) 
d, 


sd; 


The difficulty of characterizing object reclamation 
costs lies in deriving the three cost functions Cpu, Crises 
and C,,, using results from microbenchmarks. Our 
experience indicates that the simplified formula (1) for 
estimating GC time works well in practice for a mark- 
sweep GC algorithm. In the future, we will include the 


refinements discussed above if necessary. 


2.2. Application Characterization 


The following metrics describe an application’s mem- 
ory usage behavior: 


1. Object allocation rate (both in terms of the num- 
ber of objects and the number of bytes); 


2. Object death rate (both in terms of the number of 
objects and the number of bytes); 


3. Object age (the time an object remains alive); 


4. Connectivity of the live object graph, i.e., the 
number of references to an object (fan-in degree) 
and the number of references it contains (fan-out 
degree). 


Some of the metrics, such as object allocation rate, can 
be obtained quite easily. Some other metrics, such as 
object age, are difficult to measure and can only be 
estimated using profiling tools. 


One significant challenge in characterizing an applica- 
tion’s memory behavior is that of GC (and JVM) inde- 
pendence. For example, if we use the number of ob- 
jects per second as the unit for object allocation speed, 
it is not portable to other JVM or GC implementations, 
as this unit is system dependent. To solve this prob- 
lem, we use objects per bytecode as our basic unit for 
both object allocation rate and object death rate. 


2.3. Predicting GC Time 


Object allocation cost is an important part of the per- 
formance metric of GC systems. It is, however, not 
directly measurable for a given application. As a first 
step, this paper focuses on predicting the time the ap- 
plication spends on garbage collection, or the time 
between the start and finish of a garbage collection 
run. Unless otherwise specified, GC time refers to the 
cost of object reclamation, and does not include allo- 
cation costs. 


The total GC time of an application can be determined 
by two factors: the number of GC runs and the time for 
each GC run. 


With the knowledge of object allocation rate and ob- 
ject death rate, one can estimate the amount of live 
data at a given execution point, from which one can 
then calculate the number of GCs deterministically, 
assuming a heap that is fixed-size or one whose growth 
policy is known a priori. 
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The time for each GC run can be estimated using for- 
mula (1) described in section 2.1.3. The total GC time 
is the sum of times of all individual GC runs. 


3. HBench:JGC Implementation 


As depicted in Figure 1, the major components of 
HBench:JGC are: the profiler that traces an applica- 
tion’s memory behavior, the set of microbenchmarks 
whose measurement results form the characterization 
of the given garbage collection implementation, and 
finally, the analyzer that estimates the GC time given 
both application and GC characterizations. The follow- 
ing three subsections describe each component in more 
detail. 


3.1 Profiler 


Sun Microsystems’s JDK 1.2.2 provides an interface 
called the Java Virtual Machine Profiling Interface 
(JVMPI) [6] that allows one to attach a profiling agent 
to the JVM at startup time. The agent can register for 
events in which it is interested through callback func- 
tions and intercept the events as they occur. 


We are interested in the following events: GC start and 
finish, object allocation, free and move, heap dump 
and object dump. Object allocation and free events can 
be used to estimate object lifetimes and the number of 
free/live objects at a given execution point. Heap 
dumps help determine the object connectivity such as 
fan-in and fan-out degrees. Our current implementa- 
tion includes all the events except heap and object 
dump. 


3.2. Microbenchmarks 


The goal of microbenchmarking is to measure the 
fixed and per-object costs of memory reclamation. Our 
first microbenchmark deals with singular linked list 
data structures. We are in the process of creating mi- 
crobenchmarks that model more complicated object 
types with different fan-in and fan-out degrees. 


The microbenchmark first populates the heap with an 
array of linked lists of objects. The size of array, the 


length of the list, and the object size can all be dy- 
namically configured with command-line options. The 
microbenchmark then explicitly invokes garbage col- 
lection at three different times: 


1. When all objects on the heap are alive; 


2. When all objects on the heap are reclaimable, i-e., 
after the microbenchmark sets the pointers to the 
heads of the linked lists to null; 


3. When the heap is entirely empty, i.e., after the GC 
following step 2. 


To measure C,,,, we run the microbenchmark with 
different heap sizes, fixing the other two parameters. 
We then plot the GC times measured in step 3 above 
against the heap sizes. The resulting regression for- 
mula is the approximate function for C;,.- 


Similarly, to measure C,,., we run the microbenchmark 
with a varying numbers of objects, fixing the other two 
parameters. The GC times measured in step 1 above 
are then plotted against the number of objects for a 
given object size s and the resulting regression func- 
tion defines C,,.(s). Since C;,, might also depend on 


object sizes, we again repeat the microbenchmark for 
different object sizes. 


The same process is perfiormed to measure C,,,, except 
that in this case the GC times of step 2 are used. 


3.3. Analyzer 


Given both the application and GC characterizations, 
the analyzer tries to estimate the time the application 
spends on garbage collection. The analyzer also needs 
certain configuration information, such as the heap 
size, in order to determine the total GC time. Note that 
heap sizes may change dynamically. For example, if 
the memory system cannot satisfy the allocation re- 
quest even after a GC, or if the percentage of free 
space is below a certain threshold, the heap is ex- 
panded. The policies as to when and how much to ex- 
pand the heap should be specified to the analyzer. 
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JVM Version GC Algorithm 


Table 2. Test Configurations 
















1.2.2 Classic | Mostly mark-sweep 


4. Experimental Results 
4.1. Experimental Setup 


We ran our experiments on Sun Microsystems’s 
JDK1.2.2 classic version on three different machine 
configurations. Table 2 shows the hardware properties. 


Sun Microsystems’ JDKI.2.2 classic JVM uses a 
mark-sweep (with compaction) collector. Mark-sweep 
collection is one of the classical garbage collection 
algorithms that remains in wide usage today. Due to its 
conservative nature, it is popular for type-unsafe lan- 
guages such as C/C++. The collector of the JDK1.2.2 
classic JVM is a variation of the classical mark-sweep 
collector — it occasionally moves live objects around 
the heap. Although compaction does not occur often 
for the applications we tested, it does generate some 
uncertainties that make it harder to predict the GC 
time. 


We use Java applications included in the SPECJVM98 
benchmark suite [13] to evaluate the predictive power 
of our approach. Most SPECJVM98 applications in- 
duce extensive GC activities, except _222_mpegaudio, 
which is excluded from our set of test applications. 
Table 3 shows the number of bytes allocated by each 
test application quoted from the benchmark’s docu- 
mentation. The actual numbers appear to differ but the 
magnitude is the same. 


SPEC Application Allocation (MB) 
_201_compress 334 
-202_jess 
-209_db 
= 55 







213 javac 
-228_jack 


Table 3. GC Activity of Test Applications 






4.2. Microbenchmark Results 


We report the GC times of the three steps described in 
Section 3.2. Unless otherwise specified, all data points 
reported in this section are means of 10 runs of the 
microbenchmark. In most cases, the standard deviation 
is within 1%. 


4.2.1. GC on Empty Heap 
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Figure 2. GC Time of Empty Heap on Sun 
SPARC. We use an object size of 28 bytes, 
and 512 lists each with 512 objects. The 
number of objects and the size of objects re- 
main fixed as the total heap size varies. 


Figure 2 shows the garbage collection times of an 
empty heap (see step 3 in section 3.2) on the Sun 
SPARC workstation. The regression formula indicates 
that GC times of empty heaps are linearly dependent 
on the size of the heap and that the per-megabyte cost 
of an empty heap GC for this particular GC implemen- 
tation is 3.75ms. The y-intercept (0.02) is negligible. 
We therefore derive the following formula for C,,. {h) 
(described in Section 2.2) for this GC algorithm: 


C fixed (2) =3.75-h, 


where /: is the size of the heap in megabytes. The 
value of the slope (3.75) remains the same (variations 
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within 5%) for different object sizes and numbers of 
objects. 


Similar results were obtained for the other machine 
configurations, albeit with a different slope value. 


4.2.2. GC on Fully Reclaimable Heap 
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Figure 3. GC Time of Fully Reclaimable 
Heap With Respect to Heap Size on Sun 
SPARC. We use an object size of 28 bytes, 
and 512 lists each with 512 objects. The 
number of objects and the size of objects re- 
main fixed as the total heap size varies. 


Figure 3 shows the garbage collection times of a fully 
reclaimable heap (see step 2 in section 3.2). The GC 
time again shows a linear dependence on the size of 
the heap, and the slope value (3.73) is close to the 
slope value of C,,,, (3.75). If we remove the fixed cost 
Crea (empty heap), the remaining time is essentially 
independent of heap size. Since all objects on the heap 
are free and are reclaimed by the collector, this re- 
maining time, when divided by the number of dead 
objects, represents the per-dead-object cost C,. In 
this particular case, C,,, takes on a value of 
108.6/(512*512), or 0.4 ns/object. Again, similar re- 
sults are observed from runs on the other machine con- 
figurations. 


Theoretically, C,,, is independent of object size, since 
dead objects are neither scanned nor copied. However, 
to our surprise, Our measurements suggest that C,., is 
indeed dependent on object size. Figure 4(a) shows the 
results on the Sun SPARC workstation. The GC time 
seems to grow as the object size increases, until the 
object size hits 60 bytes, and stays at around 180ms 
thereafter. We do not have a conclusive explanation 
for this behavior but we hypothesize that the depend- 
ence on the object size is due to memory cache effects. 
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(a). Results on Sun SPARC 
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(b). Results on Pentium Pro 
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(c). Results on Pentium III 


Figure 4. GC Time (Excluding Fixed 
Overhead) of Fully Reclaimable Heap 
with Respect to Object Size. The GC time 
is calculated from the regression formula as 
shown in Figure 3. 


Our experiments on the other two machine configura- 
tions seem to confirm our hypothesis. Figures 4(b) and 
4(c) show the results on the Pentium Pro and Pentium 
III machines respectively. In both cases, C,,, shows 
similar dependence patterns. C,,, is independent of 
object size, except when the object size is less than 28 
bytes. The memory effects seem to be smaller for 
these two configurations than for the Sun SPARC 
workstation. 
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Figure 5. GC Time of Fully Live Heap 
with Respect to Heap Size on Sun 
SPARC. We use an object size of 28 bytes, 
and 512 lists each with 512 objects. The 
number of objects and the size of objects 
remain fixed as the total heap size varies. 


4.2.3. GC on Fully Live Heap 


Figure 5 shows the garbage collection times of a fully 
live heap (see step 1 in section 3.2). In this case, all 
objects on the heap are live and survive the garbage 
collection. Similar to the case of a fully reclaimable 
heap, the GC time shows a linear dependence on the 
size of the heap. If we exclude the fixed cost C,,.,, the 
remaining time is independent of heap size. The GC 
time, when divided by the number of total objects on 
the heap, yields the per-live-object cost C,,,. In this 
particular case, CC, takes on a _ value of 
229.1/(512%512), or about 0.9ns/object. Again similar 
results are observed from runs on other machine con- 


figurations. 


Figures 6 (a), (b) and (c) show C,,. as a function of 
object size on the Sun SPARC workstation, the Penti- 
um Pro machine, and the Pentium III machine, respec- 
tively. We observe patterns similar to those of the fully 
reclaimable heap case, albeit with different threshold 
values. For the Sun SPARC workstation case, the 
value of C,,, seems to grow as the object size in- 
creases, until the object size hits 60 bytes and stays at 
approximately 380ms thereafter. For the Pentium Pro 
machine case, the value of C,,, seems to oscillate be- 
tween 600ms and 700ms after the object size hits 28 
bytes. Similarly, for the Pentium III machine case, the 
value of C,,, oscillates between 220ms and 250ms after 
the object size hits 28 bytes. Since no objects are cop- 
ied, C,,. should be independent of object size. We 
therefore attribute this observed dependence on object 
size to memory cache effects. This effect is also de- 
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(b). Results on Pentium Pro 
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{c). Results on Pentium III 


Figure 6. GC Time (Excluding Fixed 
Overhead) of Fully Reclaimable Heap 
with Respect to Object Size. The GC time 
is calculated from the regression formula 
as shown in Figure 5. 


tected with two anomalous data points for the Pentium 
Pro configuration: at object sizes of 60 bytes and 124 
bytes. There is also a similar anomalous data point for 
the Pentium III at object size of 124 bytes. We are still 
investigating what exact memory effect causes the 
anomalies. 
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Figure 7. Cumulative Object Size Distribution in Number of Objects 


4.3. Predicting GC Time 


In this section we demonstrate how the microbench- 
mark results can be used to predict garbage collection 
time for a given Java application. 


First, we calculate the values of the three functions 
that characterize a GC algorithm, namely, C,,, C,,., 
and C,.,. Table 4 shows the coefficient values of the 
three functions for the JVM on the Sun SPARC 
workstation. For objects with size larger than 132 


bytes, the values for 132 bytes are used. 


Next we obtain characterizations of the applications’ 
memory behavior. Our current profiler implementation 
generates information such as the number of live ob- 
jects, the number of dead objects, and the object size 
distribution. Assuming that live and dead objects have 
the same size distribution, we can approximate the GC 
time function 7,, (section 2.1.4) with the following 
formula 


Toc = C fixed () + LX Chive (8): nls) 


+ DLC geag (8): ns) 


where n(s) is the normalized object size distribution 
function, i.e. n(12) is the percentage of objects with 
size equal to 12 bytes, L is the number of live objects 
and D is the number of dead objects. Figure 7 shows 
the accumulative object size distribution function for 
the test applications. Applications such as db and 
mtrt are dominated by one object size, whereas other 
applications use multiple object sizes. In general, the 


majority (more than 90%) of objects are small, i.e., 
less than 100 bytes, except for compress. 


So far our formula has not taken into consideration the 
cost of the occasional copying performed by the col- 
lector. For our test cases, copying only occurred in two 
applications in four GC invocations (out of a total of 


Cried Cu 
Per MB | Per Object 


C seas 
Per Obiect 
3.02E-04 
3.49E-04 
4.03E-04 i 
4.71E-04 
5.49E-04 
6.15E-04 


Object Size 


6.85E-04 
oars es [oe 
f108 =| 3.75 | 1.33E-03 | 6.87E-04 


Table 4. GC Characteristics on 333MHz 
UltraSPARC Ili 
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Figure 8. Predicted versus Actual GC Times. All tests were run on the Sun SPARC worksta- 
tion using a heap size of 32MB, except for javac and mtrt, which were run on a heap size of 
64MB to eliminate the variation on the number of GCs from different runs. 


dividing the number of bytes copied over the memory 
bandwidth, and we use the actual number of bytes cop- 
ied. In the future, we will enhance our analyzer to es- 
timate this information from the application memory 


thirty-five GC invocations). Three of those four GC 
invocations were explicit garbage collections made by 
the application, which trigger unnecessary copying. 
Currently we approximate this copying overhead by 
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characterization, assuming that the algorithm that de- 
cides when to perform a copy is known. We will also 
explore techniques to design microbenchmarks that 
would trigger a copy and measure the cost directly. 


Figure 8 shows the predicted versus actual GC running 
times for the six SPEC applications on the Sun SPARC 
workstation. A summary of the percentage time differ- 
ence between the predicted and the actual GC times is 
presented in Table 5. 


For compress (Figure 8(a)), there are five garbage 
collections during the execution of the compress appli- 
cation. The predicted GC times match the actual times 
quite closely (with 0.2% error rate), showing that our 
prediction model works well in this case. In the fourth 
GC run, the collector copied certain live objects to the 
beginning of the heap, which accounts for the boost in 
the GC time. The result shows that our approximation 
on the copying time works well in this case also. 


Figures 8(b), 8(c), 8(e) and 8(f) show the results for 
jess, db, mtrt and jack, respectively. The pre- 
dicted times track the actual times quite closely. No 
copying occurred in these cases. 


Figure 8(d) shows the results for javac. The pre- 
dicted times track the actual times nicely except for the 
3" 5", and 7" GC runs. It turns out that these three 
GCs were invoked explicitly by the application at times 
when the heap space had not been exhausted and most 
objects on the heap were live objects. The explicit GCs 
also trigger unnecessary copying of live objects. In this 
case, Our approximation on the copying cost does not 
work well. This might be due to the fact that the ap- 
proximation does not include the overhead for initiat- 
ing a copy, therefore it underestimates the cost in cases 
when many small objects are copied. 


In summary, HBench:JGC is able to predict the actual 
GC times within 10% for five out of the six applica- 
tions (Table 5). In the case of javac, the error rate is 
-6.4% if we disregard the three explicit GCs. The re- 
sults demonstrate that the vector-based methodology 
used by Hbench:JGC is a promising technique for pre- 
dicting application performance. In addition, we be- 
lieve that when equipped with a better profiler and 
analyzer, the prediction accuracy of HBench:JGC can 
be improved further. 


SPEC Time Difference 
Application a 
_201_compress 


-202_jess 04 


-209_db 08 


-15.8(-6.4*) 
-228_ jack 05 


* Results if we discard 3 explicit GCs. 





Table 5. Summary of Predicted vs. Ac- 
tual GC Times 


5. Discussion and Future Work 


In this section we discuss issues that might arise when 
using HBench:JGC on more sophisticated GC imple- 
mentations such as those presented in Section 2.1.2, 
and how we plan to address these issues. 


Concurrent garbage collection presents some technical 
challenges. With concurrent garbage collection, the 
application can continue to allocate new objects and 
access objects on the heap while a garbage collection 
is in process. Measuring the GC time is difficult be- 
cause the GC time is dispersed in application execu- 
tion time. We plan to approach this problem in the 
following way. We run a standard Java application 
without garbage collection, and then we run the same 
application with an additional thread that continuously 
allocates objects and invokes garbage collection. The 
performance degradation observed when the applica- 
tion is run with the additional GC intensive thread 
should be a good approximation of the GC time. 


Many concurrent collectors are also incremental. 
Therefore, we will need to estimate the percentage of 
the heap that is scanned by the collector. In most 
cases, an incremental collector sets an upper bound on 
the number of root objects to be processed, from which 
one can estimate the number of objects on the heap to 
be scanned. 


Predicting the performance of parallel garbage collec- 
tors can be potentially difficult because the speed-up 
of a parallel GC run over its sequential counterpart 
depends not only on the degree of parallelism, but also 
on how balanced each thread’s load is and the interac- 
tions between the threads such as lock contention. 
Analyzing performance of multi-threaded applications 
in general is still an active area of research. 
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To apply HBench:JGC to generational garbage collec- 
tors, we model the collector performance for each gen- 
eration, and then combine them together to form the 
total GC time. To achieve that, our profiler needs to be 
enhanced with the capability to estimate the object life 
expectancy. Furthermore, our analyzer should be able 
to predict when objects are promoted to older genera- 
tions, ie., it needs to know the age threshold for 
promotion. Some GC implementations make this 
knowledge public. For implementations that do not, 
we need to design our microbenchmark suite such that 
it can deduce the age threshold by creating and delet- 
ing objects at different rates. 


Currently, the memory cache effect is included in our 
cost functions as a function of object size. Our results 
indicate that in some cases, this simple model might 
be insufficient. We are investigating ways to model 
the memory cache hierarchy explicitly. 


Our short-term goal is to experiment on more garbage 
collector implementations and include more applica- 
tions in our experiments. In the long run, we expect to 
refine our model to cope with more sophisticated GC 
implementations and incorporate HBench:JGC into the 
HBench:Java suite, in order to more accurately predict 
a Java application’s total running time. 


6. Related Work 


Many researchers have studied the performance of 
dynamic memory management [5][15]. This literature 
provides a good foundation for understanding the in- 
herent cost of dynamic storage allocation. Our ap- 
proach differs in the goals we try to achieve. We em- 
phasize predictability — the ability to predict applica- 
tion performance on different GC implementations 
without running the application on target implementa- 
tions. In contrast, past research has focused on com- 
paring the cost of memory management by running a 
set of popular applications on target memory manage- 
ment implementations. 


Knuth [8] presents a comprehensive analysis and com- 
parison of the time complexity of several dynamic 
storage management algorithms. This systematic ap- 
proach to benchmarking memory management algo- 
rithms offers insight into the efficiency of these algo- 
rithms and helps explain the performance differences. 
However, the analysis assumes certain statistical prop- 
erties for both memory allocation and liberation pat- 
terns and only applies when the system reaches equi- 
librium. 


In [4], Cohen et al. compare performance of four com- 
pacting algorithms using analytical models. The ana- 
lytical models are parameterized by the amount of 
work to be done, such as the number of cells (objects), 
number of pointers (links) and related information, and 
the time to perform the basic operations common to all 
compactors, such as the time to test a conditional ex- 
pression. Their goal is similar to ours in that they also 
try to estimate GC execution times “without resorting 
to empirical tests’. The main difference lies in the 
level of abstraction used for the primitive (elementary) 
operations. Their primitive operations are low-level 
machine instructions, whereas we conglomerate all 
machine instructions performed on an object into a 
single per-object operation (e.g., per-live-object over- 
head). Because their primitives are at such a low-level, 
their models are more elaborate and require intimate 
knowledge of the algorithms (i.e., the complete source 
code). Furthermore, as computer architectures become 
more advanced, machine-level optimizations and the 
memory cache hierarchy could introduce significant 
side effects such that the analytical model will no 
longer be applicable. In our case, the cost of primitives 
is measured explicitly by the microbenchmark and 
therefore includes these side effects. 


7. Conclusion 


HBench:JGC is a vector-based, application-specific 
benchmarking framework for evaluating garbage col- 
lector performance. Our results demonstrate 
HBench:JGC’s unique predictive power. By taking the 
nature of target applications into account and offering 
fine-grained performance characterizations of garbage 
collectors, HBench:JGC can provide meaningful met- 
rics that help better understand and compare GC per- 
formance. 
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Abstract 


It is well known that distributed systems pose se- 
rious difficulties concerning memory management: 
when done manually, it leads to memory leaks and 
dangling references causing applications to fail. We 
address this problem by presenting a distributed 
garbage collection (DGC) algorithm for distributed 
systems supporting replicated data over wide area 
networks. 

Current DGC algorithms are not well suited for 
such systems because either (i) they do not consider 
the existence of replication, or (ii) they impose se 
vere constraints on scalability by requiring causal 
delivery to be provided by the underlying commu- 
nication layer. 

Our algorithm solves these problems by (i) adapt- 
ing classical reference-counting DGC algorithms 
that were conceived for non-replicated systems (e.g. 
indirect reference-counting, SSP chains, etc.), and 
(ii) improving our previous algorithm for replicated 
systems (i.e. Larchant). 

The result is a DGC algorithm that, besides be- 
ing correct in presence of replicated data and inde- 
pendent of the protocol that maintains such replicas 
coherent among processes, it does not require causal 
delivery to be ensured by the underlying communi- 
cations support. In addition, it has minimal perfor- 
mance impact on applications. 


1 Introduction 


Modern distributed applications sharing long-term 
data over many places, geographically separated, 
appear each day. Typical examples are found in 
the fields of concurrent engineering, cooperative ap- 
plications, etc. 

Manual memory management is extremely dif- 
ficult when developing the aforementioned dis- 


tributed applications. The reason is that graphs 
of reachability are large, widely distributed and fre 
quently modified through assignment operations ex- 
ecuted by applications. In addition, data replicated 
in many processes is not necessarily coherent mak- 
ing manual memory management much harder. For 
these reasons it is impossible to do manual memory 
management without generating dangling references 
and/or memory leaks. 

Automatic memory management, also known as 
Garbage Collection (GC), is the single realistic op- 
tion which is able to maintain referential integrity 
(i.e. no dangling references or memory leaks) in 
Wide Area Replicated Memory (WARM) systems. 
As a result, program reliability and programmer 
productivity are clearly improved. 


1.1 Shortcomings of Current Solutions 


Current DGC algorithms [1, 15] are not well suited 
for WARM systems based on data-shipping because 
of the following drawbacks: either (i) they do not 
consider the existence of replication, or (ii) they im- 
pose severe constraints on scalability by requiring 
causal delivery to be supported by the underlying 
communication layer. 

The first drawback, i.e. not considering repli- 
cated data, concerns all the classical DGC algo- 
rithms that were designed for function-shipping 
based systems, such as Indirect Reference Count- 
ing (IRC) [14] or SSP Chains [16]. As a matter of 
fact, these algorithms are not safe in presence of 
replicated data, as explained now. 

Consider Figure 1 in which an object x is repli- 
cated in processes i and j; each replica of x is noted x; 
and xj, respectively. Now, suppose that x; contains 
a reference to an object z in another process k, x; 
points to no other object, x; is locally unreachable 
and x; is locally reachable!. Then, the question is: 


Locally (un)reachability is related to (un)accessibility 
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process i 


process j 


root 





process k 


Figure 1: Safety problem of current DGC algo- 
rithms which do not handle replicated data: z is 
erroneously considered unreachable. 


should z be considered garbage? Classical DGC al- 
gorithms consider that z is effectively garbage. How- 
ever, this is wrong because, in a WARM system, 
it is possible for an application in j to “acquire” a 
replica of x from some other process, in particular, 
xj’. Thus, the fact that x; is locally unreachable in 
process i does not mean that x is globally unreach- 
able; as a matter of fact, x; contents can be accessed 
by an application in process j by means of an “ac- 
quire”. Therefore, in a WARM system, a target 
object z is considered unreachable only if the union 
of all the replicas of the source object, x in this ex- 
ample, do not refer to it. We call this the Union 
Rule (more details in Section 4.2.2). 

The second drawback, i.e. imposing severe 
constraints on scalability, affects current DGC al- 
gorithms conceived for WARM systems, such as 
Larchant [5, 10]. As a matter of fact, such algo- 
rithms are not scalable because they require the 
underlying communication layer to support causal 
delivery. 

So, in conclusion, classical DGC algorithms, such 
as IRC and SSP Chains, are not safe for WARM 
systems but promise to be scalable, in particular, 
do not require causal delivery; on the other hand, 
WARM specific DGC algorithms, such as Larchant, 
deals safely with replication but lacks scalability. 

Thus, the main contribution of this work is the 
following: showing how classical DGC algorithms 
(conceived for function-shipping based systems) can 
be extended to handle replication while keeping 
their scalability. 

We do not address the issue of fault-tolerance, 


__ from the enclosing process’s local root. 
2In distributed systems with replicated data, an “acquire” 
operation allows a process to update its local replica of a par- 
ticular object with the contents of another replica, of that 
same object, residing in some other process with a data- 
shipping mechanism. 


i.e. it is out of the scope of the paper how the algo- 
rithm behaves in presence of communication failures 
and processes crashes. However, solutions similar to 
those found in classical DGC algorithms can also be 
applied (for example, leasings as in RMI [18]. 


This paper is organized as follows. In Section 2 
we present the model of a WARM for which the 
DGC was defined. The DGC algorithm is described 
in Sections 3 and 4. Section 5 highlights some of 
the most important implementation aspects. Sec- 
tion 6 presents some performance results from a real 
application. The paper ends with some related work 
and conclusions in Section 7 and 8, respectively. 


2 WARM Model 


This section presents the model for Wide Area 
Replicated Memory (WARM). A WARM is a repli- 
cated distributed memory spanning several pro- 
cesses. These processes are connected in a network 
and communicate only by asynchronous message 
passing. We indicate that a message M has been 
sent from process i to process j as <send.M>j_,;; the 
delivery of that message is noted <deliver.M>j}. 


In a WARM, the only way to share information 
is by replication of data, which can be done with a 
DSM based mechanism[12]. Thus, processes do not 
use Remote Procedure Call (RPC) to access remote 
data. 


It’s worthy to note that application code inside 
a process never sends messages explicitly. Instead, 
application code access data always locally; trans- 
parently to the application code, the WARM run- 
time system is responsible to replicate data locally 
when needed. 


Each participating process in the WARM en- 
closes, at least, the following entities: memory, mu- 
tator’, and a coherence engine. In our WARM 
model, for each one of these entities, we consider 
only the operations that are relevant for GC pur- 
poses. 


We believe that our model is sufficiently gen- 
eral to describe most distributed systems support- 
ing wide area applications using data shipping. This 
model clearly defines the environment for which the 
DGC algorithm is conceived. 


3The term mutator [7] designates the application code 
which, from the point of view of the garbage collector, mu- 
tates (or modifies) the reachability graph of objects. 
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2.1 Memory Organization 


An object is defined to be a consecutive sequence 
of bytes in memory. Applications can have different 
views of objects and can see them as language-level 
class instances, memory pages, data base records, 
web pages, etc. 

Objects can contain references pointing to other 
objects. An outgoing inter-process reference is 
a reference to a target object in a different pro- 
cess. An incoming inter-process reference is a 
reference to an object that is pointed from a dif- 
ferent process. Our model does not restrict how 
references are actually implemented. They can be 
virtual memory pointers, URLs, etc. 

An object is said to be reachable if it is attain- 
able directly or indirectly from a GC root (defined 
in Section 3.1). An object is said to be unreach- 
able if there is no reference path (direct or indirect) 
from a GC root leading to that object. 

The unit for coherence is the object. Any object 
can be replicated (i.e. cached) in any process. A 
replica of object x in process i is noted x;. Each 
process can cache a replica of any object for reading 
or writing according to the coherence protocol being 
used. 


2.2 Mutator model 


The single operation executed by mutators, which 
is relevant for GC purposes, is reference assign- 
ment; this is the only way for applications to mod- 
ify the graph of objects. 

The reference assignment operation executed by 
a mutator in some process i is noted <x := y>j. 
This means that a reference contained in object x 
is assigned to the value of a reference contained in 
object y.4 This assignment operation results in the 
creation of a new inter-process reference from x to 
z, as illustrated in Figure 2. 

Obviously, other assignments can delete refer- 
ences transforming objects in garbage. For exam- 
ple, in Figure 2, if the mutators in processes i and j 
perform <x := 0>; and <y := 05), object z becomes 
unreachable, i.e. garbage, given that there are no 
references pointing to it. 

In conclusion, assignment operations (done by 
mutators) modify the object graph either creating 


4This notation is not fully accurate but it simplifies the 
explanation of the DGC algorithm. As a matter of fact, to 
be more precise we should write x.ref = y.ref (C++ style 
notation). However, this improved precision is not important 
for the DGC algorithm description and would complicate it 
un-necessarily. 


before <x:=y>; 





process k 


Figure 2: Creation of a new inter-process reference 
to object z through an assignment operation. 


or deleting references. An object becomes unreach- 
able when the last reference to it disappears; when 
this occurs, such an object can be safely reclaimed 
by the garbage collector because there is no possi- 
bility for any process to access it. 


2.3 Coherence Model 


The coherence engine is the entity of the WARM 
that is responsible to manage the coherence of repli- 
cas. The coherence protocol effectively used varies 
from system to system and depends on several fac- 
tors such as the number of replicas, distances be- 
tween processes, and others. However, the only co- 
herence operation, which is relevant for GC pur- 
poses, is the propagation of an object, ie. the 
replication of an object from one process to another. 
The propagation of an object y from process i to 
process j is noted propagate(y);-5;. 

We assume that any process can propagate a 
replica into itself as long as the mutator causing the 
propagation holds a reference to the object being 
propagated. Thus, if an object x is locally unreach- 
able in process i, the mutator in that process can 
not force the propagation of x to some other process; 
however , if some other process j holds a reference to 
x, it can request x to be propagated from i to j (as 
occurs in Figure 1). 

We assume that, in each process, the coherence 
engine holds two data structures, called inPro- 
pList and out Prop List; these indicate the process 
from which each object has been propagated, and 
the processes to which each object has been propa- 
gated, respectively °. Thus, each entry of the inPro- 


5 Usually, this information does exist in the coherence en- 
gine in order to manage the replicas. 
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inPropLisVoutPropList 


sentUmess/recUmess 


propObj propProc 





Figure 3: inPropList and outPropList internal data. 


\ 
before propagate(y) ,; | after propagate(y) ;-5i 





process i process i 


root 


process j 


process j 





inPropList 





Figure 4: Coherence engine propagates object y 
from process j to process i. The dashed line of y; 
means that initially, in process i, y is not yet repli- 
cated in i. 


pList/outPropList contains t he following information 
(see Figure 3): 


e propObj - the reference of the object that has 
been propagated into/to a process; 


© propPr oc - the process from/to which the ob- 
ject propObj has been propagated; 


e sentUmess/recUmess - bit indicating if a un- 
reachable message (more details in Section 3.2) 
has been sent /received. 


When an object is propagated to a process we 
say that its enclosed references are exported from 
the sending process to the receiving process; on the 
receiving process, i.e. the one receiving the propa- 
gated object, we say that the object references are 
imported. 

Figure 4 illustrates the effect of a propagation. 
Object z has no replicas. Initially, only process j 
caches a replica of y; thus, both outPropList and 
inPropList of processes j and i are empty. In ad 
dition, y; points to z. After y has been replicated 
from process } to process i, a new inter-process ref- 
erence from yj to zis created; this is due to the fact 
that the reference to z was exported from process j 


to (be imported by) process i. The inPropList and 
outPropList reflect this situation. 

In order to understand how the DGC algorithm 
works it is important to emphasize the following as- 
pects concerning the creation of inter-process refer- 
ences. The only way a process can create an inter- 
process reference is through the execution of only 
two operations: (i) reference assignment, which is 
performed explicitly by the mutator, and (ii) object 
propagation, which is performed by the coherence 
engine in order to allow the mutator to access some 
object®. 


3 Distributed Garbage 
Algorithm 


Collection 


In this section we describe the DGC algorithm and 
its data structures. Then, in Section 4 we go into 
more detail by describing a prototypical example 
which addresses all the aspects of the DGC algo- 
rithm. 

The DGC algorithm is an hybrid of tracing and 
reference-counting. Thus, each process has two GC 
components: a local tracing collector, and a dis- 
tributed collector. Each process does its local trac- 
ing independently from any other process. The 
local tracing can be done by any mark-and-sweep 
based collector. The distributed collectors, based 
on reference-counting, work together by changing 
asynchronous messages, as described in the follow- 
ing sections. In the rest of the paper we focus on 
distributed collection. 


3.1 Data Structures 


A stub describes an outgoing inter-process refer- 
ence, from a source process to a target process. A 
scion describes an incoming inter-process reference, 
from a source process to a target process. It is im- 
portant to note that stubs and scions do not impose 
any indirection on the native reference mechanism. 
In other words, they do not interfere either with 
the structure of references or the invocation mech- 
anism. They are simply GC specific auxiliary data 
structures. 

A stub stores in its internal data structures the 
following information: 


e OutRef - the reference of the target object; 


§For example, in some DSM-based systems, when the mu- 
tator tries to access an object that is not yet cached locally, 
a page fault is generated; then, this fault is automatically 
recovered by the coherence engine that obtains a replica of 
the faulted object from some other process. 
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e SourceObj - the reference of the local object 
containing the outgoing inter-process reference; 


e Scion - the identification of the corresponding 
scion; and 


e Chain - the identification of a stub or a scion 
in the same process. 


A scion stores in its internal data structures the fol- 
lowing information: 


e InRef - the reference of the target object; 


e Stub - the identification of the corresponding 
stub; and 


e Chain - the identification of a stub or a scion 
in the same process. 


Finally, a process’s GC root includes: (i) the 
local root, i.e. stacks and static variables, (ii) the 
set of scions of that process, and (iii) the lists inPro- 
pList and outPropList. 


3.2 Algorithm 


The local and distributed collectors depend on each 
other to perform their job in the following way. A 
local collector running inside a process traces the 
object graph locally cached; the starting point of 
the trace is the process’s GC root. A local trac- 
ing generates a new set of stubs; it is based on this 
new set that the distributed collector, in that pro- 
cess, may decide to update remote scions in other 
processes. 


3.2.1 Local Collector 


The local collector starts the graph tracing from the 
process’s local root and set of scions. For each out- 
going inter-process reference it creates a stub in the 
new set of stubs. Once this tracing is completed, 
every object locally reachable by the mutator has 
been found (e.g. marked, if a mark-and-sweep algo- 
rithm is used); objects not yet found are locally un- 
reachable; however, they can still be reachable from 
some other process holding a replica of, at least, 
one of such objects (as is the case of x; in Figure 1). 
To prevent the erroneous deletion of such objects, 
the collector traces the objects graph from the lists 
inPropList and outPropList, and performs as follows. 


e When a locally reachable object (previously 
discovered by the local collector) is found, the 
tracing along that reference path ends. 


e When an outgoing inter-process reference is 
found the corresponding stub is created in the 
new set of stubs. 


e For an object which is reachable only from the 
inPropList, a message unreachable is sent to the 
site from where that object has been propa- 
gated; this sending event is registered by chang- 
ing a sentUmess bit in the corresponding inPro- 
pList entry from 0 to 1.” 


When a unreachable message reaches a process, 
this delivery event is registered by changing a 
recUmess bit in the corresponding outPropList 
entry from 0 to 1. 


e For an object which is reachable only from the 
outPropList, and the enclosing process has al- 
ready received a unreachable message from all 
the processes to which that object has been pre- 
viously propagated, a reclaim message is sent to 
all those processes and the corresponding en- 
tries in the outPropList are deleted; otherwise, 
nothing is done. 


When a process receives a reclaim message it 
deletes the corresponding entry in the inPro- 
pList. 


3.2.2 Distributed Collector 


The main ideas behind the DGC algorithm can be 
summarized as follows. 


e As already mentioned, an object can be re- 
claimed only when all its replicas are no longer 
reachable. This is ensured by tracing the ob- 
jects graph from the lists inPropList and out- 
PropList; objects that are reachable only from 
these lists are not locally reachable (i.e. by the 
local mutator); however, they can not be re- 
claimed without ensuring their global unreach- 
ability, ie. that none of their replicas are ac- 
cessible. This will be explained in detail in the 
following section. 


e The DGC algorithm is independent of the par- 
ticular coherence protocol implemented by the 
coherence engine. In other words, the DGC al- 
gorithm does not require waiting for replicas to 
be coherent. 


7Note that from now on, the replica is not reachable by the 
local mutator; if another propagate operation occurs bring- 
ing a new replica of that same object into the process, the 
old replica remains locally unreachable, and a new entry is 
created in the inPropList with the corresponding sentUmess 
set to 0. 
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sent/received by 


sent when 


LGC/DGC object replica is reachable only from the inPropList 


LGC/DGC all object replicas are reachable only from the inPropLists 





DGC/DGC a new set of stubs is available 


Table 1: GC related messages. 


e Whatever the coherence protocol, there is only 
one interaction with the DGC algorithm. This 
interaction is twofold: (i) immediately before 
a propagate message is sent, the references be- 
ing exported (contained in the propagated ob- 
ject) must be found in order to create the cor- 
responding scions, and (ii) immediately before 
a propagate message is delivered, the outgoing 
inter-process references being imported must be 
found in order to create the corresponding local 
stubs, if they do not exist yet.® 


e From time to time, possibly after a local collec- 
tion, the distributed collector sends a message 
called newSetStubs; this message contains the 
new set of stubs that resulted from the local 
collection; this message is sent to the processes 
holding the scions corresponding to the stubs 
in the previous stub set. In each of the receiv- 
ing processes, the distributed collector matches 
the just received set of stubs with its set of 
scions; those scions that no longer have the cor- 
responding stub, are deleted. 


e As previously described, when a loca! collec- 
tion takes place two kinds of messages may be 
sent: unreachable and reclaim. On the receiving 
process, these messages are handled by the dis- 
tributed collector that performs the following 
operations: sets the recUmess bit in the cor- 
responding outPropList entry, and deletes the 
corresponding entry in the inPropList, respec- 
tively. 


e The DGC algorithm does not require the un- 
derlying communication layer to support causal 
delivery. 


Table 1 presents all the GC related messages of 
the model, the components responsible for sending 
and receiving them, and when they occur. In Ta- 
ble 2 we present all the events with impact on the 
GC and the corresponding actions taken. These two 
tables summarize the way GC is performed. In the 
next section we describe the DGC algorithm in more 
detail using a prototypical example. 


8Note that this may result in the creation of chains of stub- 
scion pairs, as it happens in the SSP Chains algorithm [16]. 
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4 Prototypical Example 


We use a prototypical example, illustrated in Fig- 
ures 5 and 6. This example evolves along a sequence 
of steps covering all the situations, relevant for GC, 
that occur in a WARM: (i) creation of a new out- 
going inter-process reference by means of a propa- 
gate operation, (ii) creation of a new outgoing inter- 
process reference by means of an assignment opera- 
tion, and (iii) deletion of outgoing inter-process ref- 
erences by means of assignment and propagate op- 
erations. We show how all these occurrences affect 
the GC specific data structures and messages. 

In the initial situation both x and y are cached 
in processes i and j. However, only the replica yj 
points to object z in process k. There is a single 
stub-scion pair (s2-s1) describing the only outgoing 
inter-process reference from yj to z. For the sake of 
simplicity of our description, we assume that this 
stub-scion pair is created when the system boots.® 

Then, the sequence of steps of the prototypical 
example considers the following operations (see Fig- 
ures 5 and 6; the effects of the operations are shown 
in bold). 


Step 1 - Propagate y from process j to process 
i; this results in the creation of a new outgo- 
ing inter-process reference from object y in i to 
object z in k. 


Step 2 - The operation <x := y>; is performed 
by the mutator in i; this creates a new outgo- 
ing inter-process reference from object x in i to 
object z in k. 


Step 3 - Propagate x from process i to process 
J; this results in the creation of a new outgo- 
ing inter-process reference from object x in j to 
object z in k. 


Step 4 - The operation <y := 0>j is performed 
by the mutator in j; this results in the deletion 
of an outgoing inter-process reference from ob- 
ject y in j to object z in k. 


°For example, the reference to z could be obtained from a 
name service. 
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occurs when 
reference exported 
from a process 
reference imported 
into a process 
object replica LGC runs 
reachable only 
from the inPropList 


unreachable message 
received 


reclaim message 
received 

new set of stubs 
available 


LGC runs 


newSetStubs message 
received 


propagate an object 


propagate an object 


unreachable message sent 


reclaim message sent 


newSetStubs message sent 





action taken 
create scion 


create stub 


send unreachable message to 
the process with the corresponding 
outPropList entry; set the 
sentUmess bit accordingly 
set the recUmess bit accordingly; 
if all recUmess bits for a 
particular object are set, then send 
the corresponding reclaim messages 
and delete the outPropList entry 


delete corresponding inPropList 
entry 


newSetStubs message sent to the 
processes holding the scions corres- 
ponding to the previous set of stubs 
compare stubs set with set of scions; 
delete scions with no 
corresponding stubs 


Table 2: GC related events. 


Step 5 - Propagate y from process j to process i; 
this results in the deletion of an outgoing inter- 
process reference from object y in i to object z 
ink. 


Step 6 - The operation <x := 0>; is performed 
by the mutator in i; this results in the deletion 
of an outgoing inter-process reference from ob- 
ject x in i to object z in k. 


Step 7 - the mutator in j deletes the reference 
from the local root to object x. 


Step 8 - the mutator makes x; unreachable by 
deleting the reference from the local root; thus, 
every replica of x becomes garbage. 


The prototypical example presented above has 
two parts: the first three steps results in the cre- 
ation of new outgoing inter-process references; the 
last five steps result in z becoming unreachable. In 
the next sections we describe how the DGC works 
in order to deal with this prototypical example. 


4.1 Creation of Outgoing Inter-process 


References 


In the prototypical example, the creation of outgo- 
ing inter-process references occurs first by propa- 
gation (step 1), then by reference assignment (step 


2), and finally by propagation again (step 3). We 
address these cases now. 


4.1.1 Propagation 


The first operation in the prototypical example is 
propagate(y)j4; (Figure 5, step 1). Immediately be- 
fore this message is sent from process j, object y 
must be scanned for references being exported. For 
each one of these references, the corresponding scion 
must be created. In this case, y contains only one 
reference (pointing to z); the corresponding scion s3 
is shown in bold. Note that the scion just created, 
through its Chain field, refers to the already exist- 
ing stub s2 (describing the outgoing inter-process 
reference from object y to object z). 

Immediately before propagate(y);4; is delivered 
in process i, object y has to be scanned for imported 
outgoing inter-process references in order to create 
the corresponding stubs in process i, if they do not 
exist yet. In the prototypical example, y contains 
a single reference and there is no stub describing 
it in process i. Thus, the corresponding stub s4 is 
created (shown in bold); this stub, through its in- 
ternal data structures, refers to the scion previously 
created in process j. Then, the mutator may freely 
access object y in process i. 

Thus, the information stored in the stub-scion 
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process! process | 


inPropList autProptist 





process k 


Step 1: propagation of object y from process j to process i. 





process i process | 





process k 
Step 2: creating a new inter-process reference through <x:=y>,. 





process 


Step 3: propagation of object x from process i to process j. 


inPropList 


ES 





process k 


Step 4:deleting an inter-process reference through <y:=0>. 


Figure 5: Prototypical example (part 1). 
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inPropList 





process k 


Step 5: propagation of object y from process j to process i. 


& process j is 


is 





process k 
Step 6:deleting an inter-process reference through <x:=0>, : 


inPropList 





process k 
Step 7:object x in process j becomes unreachable from the local root. 





inPropList outPropList 





process k 


Step 8:object x in process i becomes unreachable from the focal root. 


Figure 6: Prototypical example (part 2). 
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pair just created, s4-s3, is the following: 


e stub s4: OutRef refers to object z in process k, 
sourceObj refers to object y in process i, Scion 
identifies the corresponding scion s3 previously 
created in process j, and Chain is null; 


e scion s3: InRef refers to object z in process k, 
Stub identifies the corresponding stub s4 in pro- 
cess i, and Chain refers to the stub describing 
the outgoing inter-process reference from object 
y to object z. 


It is worthy to note that the mutator does not 
have to be blocked while the GC specific operations 
mentioned above are executed (scanning the object 
being propagated and creating the corresponding 
scion and stub); such operations can be executed 
in the background. 

To summarize, there are the following rules: 


Safety Rule I: Clean Before Send 
Propagate. Before sending a propagate 
message for an object y from a process j, y 
must be cleaned (i.e. it must be scanned for 
references) and the corresponding scions 
created in j. 


Safety Rule II: Clean Before Deliver 
Propagate. Before delivering a propa- 
gate message for an object y in a process i, 
y must be cleaned (i.e. it must be scanned 
for outgoing inter-process references) and 
the corresponding stubs created in i, if they 
do not exist yet. 


4.1.2 Assignment 


The second step of the prototypical example is the 
execution of the operation <x := y>j. This results 
in the creation of a new outgoing inter-process ref- 
erence: from object x in process i to z in process 
k. There is absolutely no operation to be done on 
behalf of the DGC algorithm. 

This might seem strange because, according to 
traditional reference counting algorithms [20], each 
time a reference is created, a counter (at least) must 
be incremented. In a WARM, where mutators may 
create inter-process references very easily and fre- 
quently, through a simple reference assignment op- 
eration, such increment would be extremely ineffi- 
cient. As a matter of fact, this would require instru- 
menting every reference assignment and increment 
a counter accordingly, possibly on some remote pro- 
cess. In the following sections it will become clear 
that such increment (or equivalent operation) does 
not need to be performed immediately. 
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4.1.3 Propagation 


The third step of the prototypical example is the 
propagation of object x from process i to process 
j. This results in the creation of a new outgoing 
inter-process reference: from object x in process j to 
object z in process k(shown in bold in Figure 5, step 
3). 
According to Safety Rule Clean Before Send 
Propagate, before the propagate message is sent, 
the following has to be done in process i: scan object 
x, find its enclosed references and create the corre- 
sponding scions. In this case, object x has only one 
reference; thus, as a result of the scan, scion s6 is 
created in process i (shown in bold, Figure 5, step 
3): 

In addition, it is created stub s5 describing the 
outgoing inter-process reference from object x in 
process i to object z in process k.!° 

According to Safety Rule Clean Before Deliver 
Propagate, before the propagate message is deliv- 
ered in process j, object x must be cleaned and the 
corresponding stub s7 created (shown in bold, Fig- 
ure 5, step 3). 


4.2 Deletion of Outgoing Inter-process 
References 


In the prototypical example, the deletion of outgo- 
ing inter-process references occurs first by reference 
assignment, then by propagation, then by reference 
assignment again, and finally by propagation again. 
After all these operations, object z is unreachable. 
We address these steps now. 


4.2.1 Assignments and Propagations 


The fourth step of the prototypical example is the 
execution of the operation <y := 0>j. This results in 
the deletion of the outgoing inter-process reference, 
from object y to object z (Figure 5, step 4). At this 
moment, there is absolutely no operation to be done 
for GC purposes. 

The fifth step of the prototypical example is 
propagate(y);4;. Given that the replica that is be- 
ing propagated to i no longer points to any object, 
after the propagate is delivered, the outgoing inter- 
process reference from object y in process i to z, is 
(implicitly) deleted (Figure 6, step 5). At this mo- 
ment, there is absolutely no operation to be done for 
GC purposes. Note that, given that the object be- 
ing propagated contains no references, both safety 


10Note that if a local collection has previously taken place 
in process i, stub s5 would have been already created. 
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8th step: initial situation 1st LGC 2nd LGC 





t 
3rd LGC ath LGC Sth LGC 6th LGC 


Figure 7: Timeline describing the GC operations after the 8th step of the prototypical example. 


rules do not imply the execution of any particular 
operation. 

The sixth step of the prototypical example is the 
execution of the operation <x := 0>;. This resultsin 
the deletion of the outgoing inter-process reference, 
from object x in process i to object z in process k 
(Figure 6, step 6). At this moment, there is abso- 
lutely no operation to be done for GC purposes. 

The seventh step of the prototypical example 
makes object x in process j unreachable from the 
local root. The last step makes object x in process 
i unreachable from the local root. In both cases 
there is absolutely no operation to be done for GC 
purposes. 

So far, the DGC has performed no operation. In 
particular, no scion has been deleted. Consequently, 
object z, which is no longer reachable, has not been 
reclaimed yet. This will happen only after its pro- 
tecting scion sl in process k is deleted and the local 
collector is executed. Now we address the modifica- 
tion and deletion of stubs and scions. 


4.2.2 Collecting Garbage 


In step 8 of the prototypical example we see that 
object z will be reclaimed by the local collector in 
process k only after its protecting scion sl has been 
deleted. This scion will be deleted only after the 
corresponding stub s2 in process j has disappeared; 
this will occur only after all the chain of stub-scion 
pairs s7-...-s3 gets deleted. 

According to Section 3.2, the stubs and scions 
will disappear as a result of the local and distributed 
collectors in processes i and j, asexplained now (see 
Figure 7). 


1st LGC - The local collector in process i de- 
tects that object x is reachable only from the 
inPropList; thus, a message unreachable is sent 
to process j and the corresponding sentUmess 
bit is set. 


When this message is delivered in process j, the 
recUmess bit in the corresponding entry of out- 
PropList is set. 


2nd LGC - The local collector in process j de- 
tects that object x is reachable only from the 
outPropList and the corresponding entry has its 
recUmess bit set to one; thus a message reclaim 
is sent to process i and the entry in the outPro- 
pList is deleted. 


When this message is delivered in process i, the 
corresponding entry in inPropList is deleted. 


3rd LGC - As a result of a local collection in 
process j, x is reclaimed and, consequently, stub 
s? describing its outgoing inter-process refer- 
ence to object z is not in the new set of stubs. 
This new set of stubs is sent as a newSetStubs 
message from process j to process i; then, the 
distributed collector in i deletes the correspond- 
ing scion s6. 


Note that stub s2, in spite of the fact that y in 
j holds no outgoing inter-process reference any- 
more, is still in the new set of stubs because is 
reachable from scion s3 through its Chain data 
structure. 


4th LGC - As a result of a local collection in 
process i, object x is reclaimed and the new 
set of stubs does not contain any stub (s5 and 
s4, in particular) because there are no outgoing 
inter-process references. 


This new set of stubs is sent as a newSetStubs 
message from process i to process j; then, the 
distributed collector in j deletes the correspond- 
ing scion s3. 


5th LGC - As a result of a local collection in 
process j a new set of stubs is generated in 
which there is no stub (ie. s2) because there 
are no outgoing inter-process references. 
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This new set of stubs is sent as a newSet- 
Stubs message from process j to process k; then, 
the distributed collector in k deletes the corre- 
sponding scion sl. 


6th LGC - Finally, a local collection occurs in 
process k and object z is reclaimed. 


In conclusion, we have the following rule for repli- 
cated objects: 


Safety Rule III: Union Rule. A target 
object z is considered unreachable only if 
the union of all the replicas of the source 
objects do not refer to it. 


In the prototypical example the objects pointing 
to z were the replicas of x and y. From Figure 7 it is 
clear that the union rule is respected. In addition, 
it is clear that there is no need for causal delivery 
to be ensured by the communication layer. 


5 Implementation 


We implemented our WARM distributed and lo- 
cal garbage collectors within a system called News 
Gathering (NG). In this section we briefly describe 
the NG application; then, we focus on the most im- 
portant implementation aspects of the DGC: how 
the safety rules are implemented, and the stub/scion 
data structures. 


5.1 NG Application 


NG is a web-based client-server application that we 
developed, to support the sharing of files over the 
web by means of replication [19], From the user 
point of view the client side of NG is a normal web 
browser with an extra menu button called “make- 
replica”. This function allows the user to propagate 
a file into his machine, i.e., to create a local replica 
of the file he is looking at. Once replicated, the file 
can be freely accessed with any other application 
(possibly making the replicas to diverge). Later, 
this replica can be propagated back to the site from 
where it came from by means of a make-replica oper- 
ation performed by other user running on that site. 
(Figure 8 illustrates the general architecture of this 
application.) 

With NG, a typical user in site S1 browses the 
web (web servers supporting the NG application) 
and makes-replicas of some of the pages from, for 
example, the S2 site. These pages are then edited 
by the user and, once ready, are made available from 








NG server 
(web server 
with serviets 





NG client 
(with a web browser 
component inside 


files are accessed with any tool and 


files are accessed with any tool and 
made available ty means of a web 
setve 


EAE mene nee 
sewer 












NG server 
(web server 
with serviets) 


Figure 8: General architecture of the NG applica- 
tion. Obviously, any number of sites is supported 
and not all are forced to have both a client and a 
server, i.e. some can be just clients or servers. 


user In site S1 site S2 


i) browse the NG site S2 ee 











iii) edit the replica 





URLs to other pages at 


iv) make the replica 
S2 and other sites 


available through NG 
erver in site $1 


Figure 9: Example of NG usage: i) browse the S2 
web site, ii) make-replicas of a page, iii) edit the 
replica, and iv) make the replica available for others. 


the user’s local NG server. These replicas may hold 
references to other (not locally replicated) S2 pages. 
Thus, it is desirable that such pages in the S2 web 
site remain available as long as there are references 
pointing to them. Figure 9 illustrates this scenario. 

The NG application, due to the WARM dis- 
tributed garbage collector, ensures that such pages 
at. the S2 site remain there as long as they are 
pointed from some other NG site. In addition, files 
at the S2 site, which are no longer referenced from 
any other NG site are automatically deleted by the 
garbage collector. This means that neither dangling 
references nor memory leaks occur. 

The NG application is implemented in Java; this 
includes the client code (that uses the Microsoft In- 
ternet Explorer component) and the servlets run- 
ning within an Apache web server. 


5.2 Distributed Garbage Collector 


All the code of the local and distributed collectors is 
written in Java. The local collector is implemented 
as a stand-alone application. The distributed collec- 
tor is implemented by the servlets and by the client. 
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client C1 scan F1, create stubs 


find auxiliary file, 
create scions 


make-replica (F1) 


server 


scan F1, create scions, 
store URLs in auxiliary file 





client C2 —-——_——. + scan F1, create stubs 


‘ime to perform a make-replica 


Figure 10: Propagation of file F1. 


Basically, the code in the servlets implements 
the safety rule Clean Before Send Propagate (ap- 
plied when a make-replica is requested); the client 
code implements the safety rule Clean Before De- 
liver Propagate (applied when the reply to a make- 
replica request is received). The implementation of 
these rules consists on scanning the web pages being 
propagated and creating the corresponding scions 
(at the server) and stubs (at the client). 

The first time a file is propagated, at the server 
site its contents are scanned, the corresponding 
scions created, and the enclosed set of URLs is kept 
in an auxiliary file. Later, if this same page is prop- 
agated again, at the server site it only has to be 
scanned again if it has been modified after the last 
scan. The timeline presented in Figure 10 shows 
how the scanning needed to enforce safety rules I 
and II relates to the make-replica request of file 
PL. 

Another important aspect concerning the imple- 
mentation of the garbage collectors (both local and 
distributed) is the data structures supporting the 
stubs and scions. These were conceived taking into 
account their use, in particular, to optimize the kind 
of information exchanged between sites that occurs 
when a newSetStubs message is sent. 

This message implies that the new set of stubs, 
resulting from a local collection, is sent to the pro- 
cesses holding the scions corresponding to the stubs 
in the previous stub set. Then, in each of the re- 
ceiving processes, the distributed collector matches 
the just received set of stubs with its set of scions; 
those scions that no longer have the corresponding 
stub, are deleted. 

Thus, stubs are grouped by site, i.e. there is one 


11 Note that the client can scan the page immediately af- 
ter sending a make-replica request because its contents are 
already available locally (for browsing). 


hash table for each site holding scions corresponding 
to the stubs in that table. Sending a new set of 
stubs to a particular site is just a matter of sending 
the new hash table. The same reasoning applies to 
scions: they are stored in hash tables, each table 
grouping the scions whose corresponding stubs are 
in the same site. 


6 Performance 


In this section we present the most relevant per- 
formance results concerning the DGC. The critical 
performance results are those related to the imple- 
mentation of safety rules I and II. 

Thus, we downloaded a well-known web site 
(cnn.com) and ran on each file the code implement- 
ing the safety rules. All results were obtained in a 
local 100 Mbits network, connecting PCs with Win- 
dows NT, with 64 Mb of memory and a Pentium II 
at 233 MHz. 

We downloaded all the 155 HTML files of the 
cnn.com web site!* and obtained for each one the 
time it takes to: scan it, create the corresponding 
stubs, and serialize the hash table (including writing 
to disk). In this section, for clarity, we simply refer 
to the time it takes to create stubs and their size 
because the same values apply to scions. 





number| scan | stub hash]} time 
time] creation) table} to se- 
time | size | rialize 











Table 3: Mean values obtained with all the files 
automatically downloaded from the cnn.com site 
(Sizes in bytes and times in milliseconds.). 


In Table 3 we present, for each one of the 155 
files: the mean file size, the mean number of URLs 
enclosed in each file, the mean time to scan a file, 
the mean time it takes to create a stub in the cor- 
responding hash table, the mean size of the hash 
table containing all the stubs corresponding to all 
the URLs enclosed in a file (that depends on the 
size of the corresponding URL), and the mean time 

~ 12Using an automatic t an automatic tool called WebReaper available from 


http://www.otway.com/webreaper configured with a depth 
level of 5. 


6th USENIX Conference on Object-Oriented Technologies and Systems 


73 












19055 _|__493 


health.htm 
| 











law.htm 
main.htm 







67081 588 












politics.htm | 59079 tt | 

[ showbizhtm 

| space.htm 58488 a 
sports.htm 41778 | 41778 | 366 | a 






“iasis | 100] 
54863 | 554 





36 


102933 491 45 10 26268 | 23465 


79460 523 31373 | 30194 






stub 

creation 
time 

10 





hash | URLs 





time to 
serialize 


Bees | poser | 



















10 38548 | 34108 


10 25963 | 22939 


= 26481 | 24944 111 
23308 | 18908 | 60 
21820 Le 50 


24489 | 23870 


























Table 4: Values for the top-set group of files. (Sizes in bytes, times in milliseconds.) 








1524 
5377 
a7 
16843 
33473 


CO 


ao 












O1/index htm 95 


46960 360 27 
qenult htm = peo 380 33 

02/index.htm 56783 —— 16 

03/index.htm | 31834 22 
7300 


a1 
-—362__ [25 






stub 
creation 
une 









time to 


hash | URLs 
i serialize 


Table 5: Values for the branch-set group of files in the branch world/europe. (Sizes in bytes, times in 


milliseconds.) 


it takes to serialize a hash table with all the stubs 
corresponding to a single file. 


However, in a normal browsing session, the user 
does not makes-replica of all the files. We expect 
the user to browse a few top-level pages and then 
pick one or more branches of the hierarchy. Some 
of these files will be replicated into the users local 
computer. 


So, in order to obtain more realistic numbers, we 
performed the following. We picked 10 files from 
the top of the cnn.com hierarchy. These files are 
mostly entry points to the others with more specific 
contents. We call this set of files, the top-set. We 
also picked other 10 files representing a branch of 
the cnn.com hierarchy. We call this set of files, the 
branch-set. 
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In Tables 4 and 5, for each file in the top-set and 
in the branch-set, respectively, we present the times 
mentioned above along with the size of each file and 
the number of URLs enclosed. 


These performance results are worst-case because 
they assume all the URLs enclosed in a file refer toa 
file in another site, whichis not the usual case. How- 
ever, they give us a good notion of the performance 
limits of the current implementation. In particular, 
we see that the most relevant performance costs are 
due to the scanning of a file and the serialization of 
the hash table. However, we believe that these val- 
ues are acceptable taking into account the function- 
ality of the system, i.e. it ensures that no dangling 
references and no memory leaks occur. In addition, 
when a user runs the NG browser and accesses any 
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web page without making a local replica of any file, 
there is absolutely no performance overhead due to 
DGC. 

We can also conclude that the size on disk of the 
hash table containing all the stubs for a file is about 
half the size of the HTML file’ This rather large 
size is mostly due to the size of the URLs which are 
responsible for about 90% of that size. The size of 
the file containing the stubs can certainly be reduced 
using regular compression techniques. 


7 Related work 


Much previous work in distributed garbage collec- 
tion, such as SSP Chains [16] or Network Objects 
[4, 18], considers processes communicating by mes- 
sages (without shared memory), using a hybrid of 
tracing and counting. Each process traces its inter- 
nal pointers; references across process boundaries 
are counted as they are sent in messages. 

Some object-oriented databases use a similar ap- 
proach [2, 6, 21], ie. a partition can be collected 
independently from the rest of the database. In par- 
ticular, Thor is a research OODB [13] that stores 
data in a small number of servers. This data is 
cached at workstations for processing. A Thor 
server counts references contained in objects cached 
at a cclient. Thor defers counting references origi- 
nating from some object x cached at a client, until 
x is modified at the server. 

The work most directly related to this one is 
Skubiszewski and Porteix’s GC-consistent cuts [17]. 
They consider asynchronous tracing of an object- 
oriented database, with no distribution or replica- 
tion. The collector is allowed to trace an arbitrary 
database page at any time, subject to the follow- 
ing ordering rule. For every transaction accessing 
a page traced by the collector, if the transaction 
copies a pointer from one page to another, the col 
lector either traces the source page before the write, 
or traces both the source and the destination page 
after the write. The authors prove that this is a 
sufficient condition for safety and liveness. 

Most previous work on garbage collection in 
shared memory deals either with multiprocessors 
[3, 8] or with a small-scale DSM [9, 11]. These au- 
thors make strong coherence assumptions, and they 
ignore the fundamental issue of scalability. 

Yu and Cox [22] describe a conservative col- 
lector for the TreadMarks DSM system. It uses 
partitioned GC on a process basis; it is strongly 
integrated with TreadMarks and all messages are 
scanned for possible contained pointers. 


Previous work in DGC as IRC [14], SSP 
chains [16] and Larchant [10] served as the starting 
point of the DGC algorithm presented in this paper. 
Our new algorithm builds on these previous two al- 
gorithms in such a way that combines their advan- 
tages: no need for causal delivery support to be pro- 
vided by the underlying communicating layer (from 
the first two), and capability to deal with replicated 
objects (from Larchant). 


8 Conclusions and Future Work 


In this paper we presented a new DGC algorithm 
for a WARM. The algorithm is general enough to be 
widely applicable given the minimal assumptions of 
the underlying model. 

The fundamental aspects of the DGC algorithm 
are the following. 


e It does not interfere with the protocol that 
maintains the replicas coherent among the par- 
ticipating processes. This means that the DGC 
does not require replicas to be coherent. 


e It does not require causal delivery to be sup- 
ported by the underlying communication layer. 
Given that supporting causal delivery in wide 
area networks is difficult and inefficient, this is 
a fundamental aspect to ensure the DGC algo- 
rithm scalability. 


e It is safe in presence of replicated objects, i.e. 
it respects the union rule. 


We presented our DGC algorithm as an evolution 
of two previous ones: a classical one designed for dis- 
tributed systems based on function-shipping with 
no replication support, SSP chains, and Larchant 
which is targeted to distributed systems with repli- 
cated objects. However, it’s important to note that 
any classical distributed garbage collection algo- 
rithm based on reference-counting can be used in- 
stead of SSP Chains (e.g. IRC). The only require 
ment would be its integration with the WARM in 
such a way that the safety rules are respected. 

Concerning future research directions, we intend 
to address the fault-tolerance of the DGC algorithm. 
In other words, we are starting to study how the 
DGC algorithm should be designed so that it can 
remain safe, live and complete in spite of process 
crashes and permanent communication failures. We 
are also investigating how the DGC algorithm is af- 
fected if the WARM is accessed using transactions. 
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Abstract 


Mainstream object-oriented languages, such as C++ 
and Java’, provide only a reswicted form of polymor- 
phic methods, namely uni-receiver dispatch. In com- 
mon programming situations, developers must work 
around this limitation. We describe how to extend the 
Java Virtual Machine to support multi-dispatch and ex- 
amine the complications that Java imposes on multi- 
dispatch in practice. Our technique avoids changes to 
the Java programming language itself, maintains source 
code and library compatibility, and isolates the perfor- 
mance penalty and semantic changes of multi-method 
dispatch to the program sections which use it. We have 
micro-benchmark and application-level performance re- 
sults for a dynamic Most Specific Applicable (MSA) 
dispatcher, a framework-based Single Receiver Projec- 
tions (SRP) dispatcher, anda tuned SRP dispatcher. Our 
general-purpose technique provides smaller dispatch la- 
tency than programmer-written double-dispatch code 
with equivalent functionality. 


1 Introduction 


Object-oriented (OO) languages provide powerful tools 
for expressing computations. One key abstraction is the 
concept of a type hierarchy which describes the relation- 
ships among types. Objects represent instances of these 
different types. Most existing object-oriented languages 
require each object variable to have a programmer- 
assigned static type. The compiler uses this information 
to recognize some coding errors. The principle of sub- 
stitutability mandates that in any location where type T 
is expected, any sub-type of T is acceptable. But, substi- 
tutability allows that object variable to have a different 
(but related) dynamic type at runtime. 


Another key facility found in OO languages is method 
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selection based upon the types of the arguments. This 
method selection process is known as dispatch. It can 
occur at compile-time or at execution-time. In the for- 
mer case, where only the static type information is 
available, we have static dispatch (method overload- 
ing). The latter case is known as dynamic dispatch 
(dynamic method overriding or virtual functions) and 
object-oriented languages leverage it to provide poly- 
morphism — the execution of type-specific program 
code. 


We can divide OO languages into two broad categories 
based upon how many arguments are considered dur- 
ing dispatch. Uni-dispatch languages select a method 
based upon the type of one distinguished argument; 
multi-dispatch languages consider more than one, and 
potentially all, of the arguments at dispatch time. For 
example, Smalltalk [14] is a uni-dispatch language. 
CLOS [23] and Cecil [6] are multi-dispatch languages. 
Other terms, like multiple dispatch, are used in the liter- 
ature. However, the term multiple dispatch is confusing 
since it can mean either successive uni-dispatches or a 
single multi-dispatch. In fact, in this paper, we compare 
multi-dispatch to double dispatch, which uses two uni- 
dispatches. 


C++ [24] and Java [15] are dynamic uni-dispatch lan- 
guages. However, for both languages, the compiler 
considers the static types of all arguments when com- 
piling method invocations. Therefore, we can regard 
these lang uages as supporting static multi-dispatch. Fig- 
ure 1 depicts both dynamic uni-dispatch and static multi- 
dispatch in Java. 

Uni-dispatch limits the method selection process to con 
sider only a single argument, usually the receiver. This 
is a substantial limitation and standard programming id- 
ioms exist to overcome this restriction. As a motivation 
for multi-dispatch, we describe one programming idiom 
that demonstrates the need for multi-dispatch, describe 
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class Point { 

int 2%; ¥% 

void draw(Canvas c) { // Point-specific code } 
void translate(int t) {x+=t; yr=t;} 
void translate(int tX,int tY) {x+=tX; y+=ty;} 


} 


class ColorPoint extends Point { 
Color c; 
void draw(Canvas C) { // ColorPoint code } 


// same static type, different dynamic types 
Point Pp = new Point(); 
Point Pe = new ColorPoint (); 


// static multi-dispatch 
Pp.translate(5); // one int version 
Pp.translate(1,2); // two int version 


// dynamic uni-dispatch 
Pp.draw{aCanvas); // Point::draw() 
Pe.draw(aCanvas); // ColorPoint: :draw/() 





Figure 1: Dispatch Techniques in Java 


how it can be replaced by multi-dispatch, list the ad- 
vantages of using multi-dispatch to replace the idiomatic 
code, and measure the cost of using multi-dispatch with 
one of our current multi-dispatch algorithms. 


1.1 Double Dispatch 


Double dispatch occurs when a method explicitly checks 
an argument type and executes different code as a re- 
sult of this check. Double dispatch is illustrated in Fig- 
ure 2(a) (from Sun’s AWT classes) where the process- 
Event (AWTEvent) method must process events in dif - 
ferent ways, since event objects are instances of differ- 
ent classes. Since all of the events are placed in a queue 
whose static element type is AWTEvent, the compiler 
loses the more specific dynamic type information. When 
an element is removed from the queue for processing, its 
dynamic type must be explicitly checked to pick the ap- 
propriate action. This is an example of the well-known 
container problem [5]. 

Double dispatch suffers from a number of disadvan- 
tages. First, double dispatch has the overhead of in- 
voking a second method. Second, the double-dispatch 
program is longer and more complex; this provides 
more opportunity for coding errors. Third, the double- 
dispatch program is more difficult to maintain since 
adding a new event type requires not only the code to 
handle the new event, but another cascaded else if 
statement. 


The need for double dispatch develops naturally in sev- 
eral common situations. Consider binary operations [4], 
such as the compareTo (Object) method defined in in- 
terface Comparable. The programmer must ascertain 
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the type of the Object argument before continuing to 
perform a type-specific comparison. Another common 
use for double dispatch is in drag-and-drop applications, 
where the result of a user action depends on both the data 
object dragged and on the target object. A generic drag- 
and-drop schema forces the programmer to test data 
types and re-dispatch to a more specific method. A third 
example is in event-driven programming. As we saw 
in Figure 2, applications are written using base classes 
such as Component and Event, but we need to take ac- 
tion based upon the specific types of both Component 
and Event. Indeed, the need for multi-dispatch is ubiq- 
uitous enough that two of the original design patterns, 
Visitor and Strategy, are work-arounds to supply multi- 
dispatch functionality within uni-dispatch languages. 


Consider how the AWT example could be re-written if 
dynamic multi-dispatch was available in Java. An equiv- 
alent program, partially using multi-dispatch, would re- 
semble Figure 2(b). For clarity, we have not completely 
converted the code to use multi-dispatch; we maintain 
the case statement and double dispatch to select among 
MouseEvent categories. A more complete factoring 
of MouseEvent into MouseButtonEvent and Mouse- 
MotionEvent would eliminate the remaining double 
dispatch, resulting in a Full Multi-Dispatch version of 
the code. The dynamic multi-dispatcher will select the 
correct method at runtime based upon the dispatchable 
arguments in addition to the receiver argument (the in- 
stance of Component). Individual component types can 
still override the methods that accept specific event types 
(e.g. KeyEvent, FocusEvent) and will do so without 
invoking the double-dispatch code. 


The multi-dispatch version is shorter and clearer. 
However, it requires the Java Virtual Machine 
(JVM) [20] to directly dispatch an Event to the 
correct processEvent (AWTEvent) method. Our 
modified JVM provides this facility and correctly 
executes the multi-dispatch code discussed above. 
Furthermore, Table 1, a subset of Table 4, shows that 
multi-dispatch is substantially faster than interpreted 
double dispatch and even faster than JIT-ed double 
dispatch. Note that the numbers in Table | are based on 
single-threaded code. 


Our experience with the Swing GUI classes [26] rein- 
forces our belief that double dispatch in AWT is a sig- 
nificant factor in Swing applications. First, Swing does 
not operate without AWT; instead each AWTEvent is 
accepted by a Swing JComponent. Therefore, every 
mouse-click and key-press is double dispatched through 
AWT into Swing. Next, Swing type-checks the event 
and double dispatches again. Internally, Swing avoids 
further double dispatch by coding the AWTEvent type 


USENIX Association 


package java.awt; 
class Component { 


// double dispatch events to subComponent 

void processEvent (AWrEvent e) { 

if (e instanceof FocusEvent) { 
processFocusEvent( (FocusEvent) e) ; 

} else if (e instanceof MouseEvent) { 
switch (e.getID()) { 
case MouseEvent.MOUSE_PRESSED: 


case MouseEvent .MOUSE-EXITED: 
processMouseEvent ( (MouseEvent) e) ; 
break; 

case MouseEvent.MOUSE.MOVED: 

case MouseEvent.MOUSE.DRAGGED: 
processMouseMotionEvent ( (MouseEvent)e) ; 
break; 


} else if (e instanceof KeyEvent) { 
processKeyEvent ( (KeyEvent)e) ; 

} else if (e instanceof ComponentEvent) { 
processComponentEvent ( (ComponentEvent Je) ; 

} else if (e instanceof InputMethodEvent) { 
processInputMethodEvent ( (InputMethodEvent) e) ; 


// other events ignored by Component 


void processFocusEvent(FocusEvent e) {...} 

void processMouseEvent (MouseEvent e) {.. -} 

void processMouseMotionEvent (MouseEvent e) {...} 
void processKeyEvent (KeyEvent e) {...} 


void processComponentEvent(ComponentEvent e) {...} 


void processInputMethodEvent (InputMethodEvent e) {... 


(a) Double Dispatch in Java 





package java.awt; 
class Component { 


void processEvent(AWTEvent e) {. oes 


void processEvent(MouseEvent e) { 
switch (e.getID()) { 
case MouseEvent . MOUSE.PRESSED: 


case MouseEvent .MOUSE-EXITED: 
processMouseEven t( (MouSeEvent) e) ; 
break; 

case MouseEvent.MOUSE_MOVED: 

case MouseEvent . MOUSE.DRAGGED: 
processMouseMotionEvent( (MouseEvent)e); 
break; 


void processEvent (FocusEvent e) {...} 

void processMouseEvent(MouseEvent e) {es vs 

void processMouseMot ionEvent (MouseEvent e) {...} 
void processEvent (KeyEvent e) {...} 

void processEvent (ComponentEvent e) {... 


void processEvent(InputMethodEvent e) {... 


(b) Equivalent Code in Multi-Dispatch Java 


Figure 2: Double vs. Multi-Dispatch in Java 


into the selector (e.g. fireInternalEvent{)). De- 
spite the limitations this imposes on the programmer, it 
is clear that double dispatch is still the standard tech- 
nique in Swing as well. 

Also, a multi-dispatch JVM could benefit other lan- 
guages. For example, Standard ML, Scheme, and Eiffel 
have implementations which generate JVM-compatible 
binary files. Extending these languages to include multi- 
dispatch semantics becomes straightforward. Unlike 
techniques based on source code translation, our multi- 
dispatch JVM can be directly used by other languages. 


The research contributions of this paper are: 


1. The design and implementation of an extended Java 
Virtual Machine that supports arbitrary-arity multi- 
dispatch with the properties: 

(a) The Java syntax is not modified. 
(b) The Java compiler is not modified. 


(c) The programmer can select which classes 
should use multi-dispatch. 

(d) The performance and semantics of uni- 
dispatch methods are not affected. 


(e) The existing class libraries are not affected. 


(f) The existing reflection API is preserved. 


2. The introduction of a dynamic version of Java’s 
Static multi-dispatch algorithm. 


3. The first performance results for table-based multi- 
dispatch techniques in a mainstream language. 


We begin by reviewing some important details about the 
uni-dispatch JVM. Next, we sketch our JVM modifica- 
tions to enable multi-dispatch. Then, we present experi- 
mental results for implementations of our multi-dispatch 
techniques. This is followed by a discussion of several 
complex issues that a practical multi-dispatch Java must 
address and a description of some of the details of our 
implementation. Finally, we close with a description of 
future work and a review of related approaches to multi- 
dispatch. 


2 Background 


The Java Programming Language [15] is a static 
multi-dispatch, dynamic uni-dispatch, dynamic loading 
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Interpreter 
Time in ps (o) 


Dispatch 
Type 
0.91 (0.00) 


[Muli || 034 (0.00) ] 
[Full Muni-_[| 0.32 (0.00) | 


Normalized 


0.37 | 


OpenJIT 
(c)_| Normalized 
(0.01) 
(0.01) 


Time in js 
0.48 
0.32 





Table 1: AWT Event Dispatch Comparison 


(Call-site Dispatch Time in microseconds, Subset of Table 4) 


object-oriented language. Our primary design goal is 
to extend the dynamic method selection to optionally 
and efficiently consider all arguments, without affecting 
the syntax of the language or any other semantics. Our 
secondary goals are to retain the dynamic and reflective 
properties of Java. 


In order to meet these goals, we chose to modify the 
JVM [20] implementation, rather than modifying the 
programming language itself. Java programs are com- 
piled by javac (or other compiler) into sequences of 
bytecodes — primitive operations of a simple stack- 
based computer. These bytecodes are interpreted by a 
JVM written for each hardware platform. We began 
with the classic VM (now known as the Research Vir- 
tual Machine?) written in C and distributed by Sun Mi- 
crosystems, Inc. Other JVM implementations exist and 
many include just-in-time (JIT) compiler technology to 
enhance the interpretation speed at runtime by replacing 
the bytecodes with equivalent native machine instruc- 
tions. At present, our modified JVM is compatible with 
the OpenJIT 1.1.15 [21] compiler. 


Before we look at how to implement multi-dispatch in 
the virtual machine, we first need to understand the bi- 
nary representation that the virtual machine executes, 
how method invocations are translated into the virtual 
machine code, and how the JVM actually dispatches the 
call-sites. 


2.1 Java Classfile format 


The JVM reads the bytecodes, along with some neces- 
sary symbolic information from a binary representation, 
known as a .class file. Each .class file contains a 
symbol table for one class, a description of its super- 
classes, and a series of method descriptions containing 
the actual bytecodes to interpret. We leverage the sym- 
bolic information, called the constant pool, to imple- 
ment multi-dispatch. 


Figure 3 shows the layout of the constant pool for the 
ColorPoint class shown in Figure 1. 


2The Research Virtual machine was initially releasedas the classic 
reference VM. Sun later renamed it the Exact VM. With the advent 
of the HotSpot VM, the classic VM was renamed again, becoming the 
Research VM. 


Conceptually, the constant pool consists of an array con- 
taining text strings and tagged references to text strings. 
InFigure 3, class Point is represented by a tag entry at 
location | that indicates that it isa CLASS tag and that we 
should look at constant pool location 2 for the name text. 
Then, the constant pool contains the text string “Point” 
at location 2. Therefore, a class symbol requires two 
constant pool entries. Method references are similar, ex- 
cept they require five constant pool entries. 


CLASS #2 
TEXT “Point” 
CLASS #4 

TEXT “ColorPoint 
METHOD #1 «(#6 
NAME&TYPE #7 #8 
TEXT “<init>" 
TEXT "Vv" 
METHOD #1 = #10 
NAME&TYPE #11 #12 
TEXT “draw” 
TEXT "(LCanvas;)V" 
NAME&TYPE #14 #15 
TEXT eer 
TEXT “Color” 


Point 
ColorPoint 


Point::<init>:()V 
and for our initializer 


CODA MWF Wh = 


© 


Point::draw:(LCanvas;)V 
and for our method 


BB=S 


used for our field 


ae 





Figure 3: A Simple Constant Pool 


In our example, constant poo! location 9 contains the tag 
declaring that it contains a METHOD. It references the 
CLASS tag at location 1, to define the static type of the 
class containing the method to be invoked. In this case, 
the class happens to be Point itself, but, more often, 
this is not the case. The METHOD entry also references 
the NAME-AND-TYPE entry at location 10. This NAME- 
AND-TYPE entry contains pointers to text entries at lo- 
cations 11 and 12. The first location, 11, contains the 
method name, “draw”. The second location, 12, con- 
tains an encoded signature ‘“‘(LCanvas; ) Vv” describing 
the number of arguments to the method, their types, and 
the return type from the method. In our example, we see 
one class argument with name “Canvas” and that the 
return type is void. 


2.2 Static Multi-Dispatch in Javac 


The Java compiler converts source code into a binary 
representation. When it encounters a method invoca- 
tion, javac must emit a constant pool entry that de- 
scribes the method to be invoked. It must provide an 
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exact description, so that, for instance, the two trans- 
late({...) methods in Point can be distinguished at 
runtime. Therefore, it must examine the types of the 
arguments at a call-site and select between them. This 
selection process, which considers the static types of all 
arguments, can be viewed as a static multi-dispatch. 


The Java Language Specification, 2nd _ Edition 
(JL.S) [15] provides an explicit algorithm for static 
multi-dispatch called Most Specific Applicable (MSA). 
At a call-site, the compiler begins with a list of all 
methods implemented and inherited by the (static) 
Teceiver type. Through a series of culling operations, 
the compiler reduces the set of methods down to a single 
most specific method. The first operation removes 
methods with the wrong name, methods that accept 
an incorrect number of arguments, and methods that 
are not accessible from the call-site. This latter group 
includes private methods called from another class and 
protected methods called from outside of the package. 


Next, any methods which are not compatible with the 
static type of the arguments are also removed. This 
test relies upon testing widening conversions, where one 
type Tsu, can be widened to another Tsu pe, if and only 
if Tsu is the same type as Tse, OF a subtype Of Tsu per- 
For example, a FocusEvent can be widened to an AWT- 
Event because the latter is a super-type of the for- 
mer?. The opposite is not valid: an AWTEvent cannot 
be widened to a FocusEvent; indeed a type-cast from 
AWTEvent to FocusEvent would need to be a type- 
checked narrowing conversion. 


Finally, javac attempts to locate the single most specific 
method among the remaining subset of statically appli- 
cable methods. One method M(7)1,...,Ti,,) iS con- 
sidered more specific than M(T2,1,...,T2,,) if and only 
if each argument type T),; can be widened to T2,; for 
each (i = 1,...,m), and for some j, T2,; cannot be 
widened to 7;,;. In effect, this means that any set of 
arguments acceptable to M(T2,1,...,T2,,) iS also accept- 
able to M(T;,1,.--,T1,,), but not vice versa. 


Given the subset of applicable methods, javac selects 
one A, as its tentatively most specific. It then checks 
each other candidate method Af, by testing whether its 
arguments can be widened to the corresponding argu- 
ment in Af,. If this is successful, then Af, is at least 
as specific as M,; the compiler adopts Af, as the new 
tentatively most specific method — the method M;, is 
culled from the candidate list. If the first test, whether 
M, be widened to Af;, is unsuccessful, then the com- 
piler checks the other direction: can M, be widened to 


3The JLS separately recognizes identity conversions (a Focus- 
Event can be converted into a FocusEvent). Javac does not dis- 
tinguish them, so we do the same for our exposition. 


M,. If so, then the compiler drops Af, from the candi- 
date list. 


Unfortunately, both tests can fail. To illustrate this, con- 
sider the first two methods in Figure 4. The first argu- 
ment of the first method (ColorPoint) can be widened 
to the type of the first argument of the second method 
(Point). But the opposite is true for the second ar- 
gument of each method. If we invoke colorBox with 
two ColorPoint arguments, both methods apply. If the 
third method was not present, we would have an ambigu- 
ous method error. The third method, taking two Color- 
Points, removes the ambiguity because it is more spe- 
cific than both of the other methods. It allows both of the 
others to be culled, giving a single most specific method. 


colorBox(ColorPoint pl, Point p2) {...} 
colorBox(Point pl, ColorPoint p2) {...} 


// conflict method removes ambiguity 
colorBox(ColorPoint pi, ColorPoint p2) {...} 





Figure 4: Ambiguous and Conflict Methods 


Primitive types*, when used as arguments, are tested at 
compilation time in the same way as other types. Primi- 
tive widening conversions are defined which effectively 
impose a standard type hierarchy on the primitive types. 
The compiler inserts widening casts as needed. 


2.3. Dynamic Uni-Dispatch in the JVM 


Now we turn our attention to dispatching polymorphic 
call-sites at runtime. Methods are stored in the .class 
file as sequences of virtual machine instructions. Within 
a stream of bytecodes, method invocations are repre- 
sented by invoke bytecodes that occupy three bytes?. 
The first byte contains the opcode (0xb6 for invoke- 
virtual). The remaining two bytes form an index 
into the constant pool. The constant pool must con- 
tain a METHOD entry at the given index. This entry 
contains the static type of the receiver argument (as 
the CLASS linked entry), and the method name and 
signature (through the NAME&TYPE entry). Figure 5 
shows the pseudo-bytecode® for invoking the method 
Component. processEvent {AWTEvent) twice. 


From the opcode, invokevirtual, the JVM knows 
that the next two bytes contain the constant pool index 
of a METHOD descriptor. From that descriptor, the JVM 
can locate the method name and signature. The JVM 
parses the signature to discover that the method to be 
invoked requires a receiver argument and one other ar- 
gument. Therefore, the JVM peeks into the operand 

“Java provides non-object types byte, char, short, int, long, 
float, and double. These are called primitive types. 

5The invokeinterface bytecodes occupy 5 bytes. 


Rather than show constant pool indices, we show their values di- 
rectly. 
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Component aComponent = new SsubComponent(...); 
AWrEveot anEvent = new FocusEvant(...); 
FocusBvent aFocusEvant = rew FocusEvent(--.); 


aComponent .processEvent (anEvent) ; 
aCamponent . processEvent (aFocusEvent) 7 








(a) Polymorphic Call-sites in Source. 


aComponent 
anEvent 


apush 
apush 
invokevirtual 


Component: :processEvent(LA WTEvent;)V 


apush aComponent 
apush aFocusEvent 
invoxkevirtual Component:processEvent(LAWTEvent;)V 





(b) Polymorphic Call-sites in Bytecodes. 


Figure 5: Polymorphic Call-sites — two views 


stack and locates the receiver argument. At this point, 
the JVM has the information it needs to begin searching 
for the method to invoke. The JVM has the name, the 
signature, and the receiver of the message. 


The JVM Specification (section 5.4.3.3) provides a re- 
cursive algorithm for resolving a method reference and 
locating the correct method: Beginning with the meth- 
ods defined for the precise receiver argument type, scan 
for an exact match for the name and signature. If one 
is not found, search the superclass’ of the receiver argu- 
ment, continuing up the superclass chain until Object, 
the root of the type hierarchy, is searched. If an exact 
match is not found, throw an AbstractMethodError. 
This look-up process applies to each of the invoke 
bytecodes. 


This look-up process is a time-intensive operation. To 
teduce the overhead of method look-up, the resolved 
method is cached in the constant pool along side the orig- 
inal method reference. The next time this method refer- 
ence is applied by another invoke bytecode, the cached 
method is used directly. 


Once a method is resolved, a method-specific invoker is 
executed to begin the interpretation of the new method. 
This invoker performs method-specific operations, such 
as acquiring a lock in the case of synchronized meth- 
ods, constructing a JVM activation record in the case of 
bytecode methods, or preparing a machine-level activa- 
tion record for native methods. 


The Research JVM recognizes a special case in invoking 
methods: any private methods, final methods, or con- 
structors can be handled in a non-virtual mode. Each of 
these situations do not require dynamic dispatch. But, 


7 Java provides only single inheritance of program code. 
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multi-dispatch will need to handle these special cases. 


3 Design 


We now have sufficient information to describe the gen- 
eral design for extending the JVM to support multi- 
dispatch. In short, we mark classes which are to use 
multi-dispatch and replace their method invokers with 
one that selects a more specific method based on the ac- 
tual arguments. Hence, existing uni-dispatch method in- 
vocations are unchanged in any way. 


Marking the .class files without changing the lan- 
guage syntax is straightforward. We created an empty 
interface MultiDispatchable and any class which 
will provide multi-dispatch methods must implement 
that interface. The .class file retains that interface 
name and the virtual machine can easily check f or this at 
class loading time. Our implementation does not change 
the syntax of the Java programming language or the bi- 
nary .class file format in any way. 


Our interface-based technique allows us to retain com- 
patibility with existing programs, compilers, and li- 
braries. Any class that implements our marker interface 
has different semantics for dispatch. But, the semantics 
of existing uni-dispatch programs and libraries are not 
changed since they do not implement the interface. The 
programmer retains complete contro] and responsibility 
for designating multi-dispatchable classes. This allows 
the developer to consciously target the multi-dispatch 
technique to known programming situations, such as 
double dispatch. 


At dispatch time, our multi-invoker executes instead of 
the original JVM invoker. Our invoker locates a more- 
precise method based on the dynamic types of the invo- 
cation arguments and executes it in place of the original 
method. 


The non-virtual mode invocations need to be handled 
specially. Constructors are never multi-dispatched. We 
found that constructor chaining within a class could 
cause infinite loops. Private and final multi-methods are 
still multi-dispatched. 


We implemented two different dispatch algorithms. 
First, MSA implements a dynamic version of the 
Java Most Specific Applicable algorithm used by the 
javac compiler. Second, Single Receiver Projections 
(SRP) [17] is a high performance table-based technique 
developed at the University of Alberta. We examine both 
a framework-based SRP and a tuned SRP implementa- 
tion. Section 6 provides implementation details, but we 
first present the results of our experiments. 


USENIX Association 


USENIX Association 


4 Experimental Results 


So far, we have used four different micro-benchmarks 
and a new implementation of Swing/AWT to test our 
multi-dispatcher. 


The first micro-benchmark uses the javac compiler 
to recompile itself while running on the multi-dispatch 
VM. The javac compiler has not been modified, there- 
fore the experiment demonstrates the backward compat- 
ibility of the modified VM for uni-dispatch applications. 
The measured overheads of uni-dispatch javac running 
on the multi-dispatch VM are minimal. The other three 
micro-benchmarks demonstrate multi-dispatch correct- 
ness, multi-dispatch performance as compared to dou- 
ble dispatch, and multi-dispatch performance as arity 
increases. All of the micro-benchmarks are single- 
threaded. 


For our application-level tests, we modified Swing, the 
second-generation GUI library bundled with Java 2, to 
use multi-dispatch. As expected, Swing is a double- 
dispatch-intensive library. We also converted AWT be- 
cause Swing depends heavily on AWT to dispatch the 
events into top-level Swing components. 


All experiments were executed on a dedicated Intel- 
architecture PC equipped with two 550MHz Celeron 
processors, a |OOMHz front-side bus, and 256 MB of 
memory. The operating system is Linux 2.2.16 with 
glibc version 2.1. The Sun Linux JDK 1.2.2 code was 
compiled using GNU C version 2.95.2, with optimiza- 
tion flags as supplied by Sun’s makefiles®. The table- 
based multi-dispatch code [22] was compiled using GNU 
G++ version 2.95.2?. The Sun JDK only supports the 
green threading model, which is implemented using 
pthreads under Linux. We report average and standard 
deviations for 10 runs of each benchmark. 


We tested three different virtual machines. First, we 
have jdk, the standard JDK 1.2.2 Linux runtime, run- 
ning in interpreter mode. This JVM serves as a baseline 
for comparing the remaining four multi-dispatch sys- 
tems. Second, we have a non-JIT multi-dispatch JVM 
with three different multi-dispatch techniques, jdk-MSA, 
and two implementations (jdk-fSRP, and jdk-tSRP) of 
the same algorithm. Third, we have customized OpenJIT 
1.1.15 to be compatible with our multi-dispatch JVM. 


For the first and second micro-benchmarks, (Tables 2 
and 3) we report user+system time in seconds, along 
with normalized values against the jdk runtime. For the 
third and fourth experiments (Table 4 and Figure 7), we 
describe individual dispatch times in microseconds, ig- 

8Typical flags are -02 

9 with options -ansi -fno-implicit-templates -fkeep- 
inline-functions -02. 


noring other costs. In the final benchmark, Swing, we re- 
port execution times for a synthetic application that cre- 
ates a number of components and inserts 200,000 events 
into the event queue. 


4.1 Javac — Compatibility Test 


The first experiment requires the runtime to load and 
execute the javac compiler to translate the entire 
sun. tools hierarchy of Java source files into .class 
files. This hierarchy includes 234 source files encom- 
passing 49,798 lines of code (excluding comments). 
Each compilation was verified by comparing the error 
messages!° and by checksumming the generated bina- 
ries. Each virtual machine passed the test; the timing 
results are shown in Table 2. These times come from the 
Unix time user command and are averages, with stan- 
dard deviation, of 10 runs. 


fl JVM i Time in sec. (a) 


I 65.41 +025 (039) | 1.00 |] 
i0k-ISRP 


Table 2: Compatibility Testing and Performance 
(User+System Time to Recompile sun. tools, in seconds) 


Norm. ll 







The negligible differences between the uni-dispatch 
and multi-dispatch execution times demonstrate that 
the overhead of running uni-dispatch code on a multi- 
dispatch VM is essentially zero. Note that in our im- 
plementation, table-based JVMs do not construct a dis- 
patch table until the first multi-dispatchable method is 
inserted. 


4.2 Simple Multi-Dispatch 


In this micro-benchmark, we show that multi-dispatch 
is correct and measure its overhead. The testing code 
is short and is shown in Figure 6. Note that class MD- 
JDriver implements the marker interface MultiDis- 
patchable. The compiler uses static multi-dispatch to 
code all four calls to MDJDriver.m{X, xX) to execute 
the method for two arguments of type A, because that is 
the static type of both ana and aB. Multi-dispatch ac- 
tually selects among the four methods based upon the 
dynamic types of the arguments. Therefore, correct out- 
put consists of 100,000 repetitions of four consecutive 
lines: AA, AB, BA, and BB. For timing purposes, all out- 
put was redirected to /dev/null to reduce the impact 
of input/output. Our results are summarized in Table 3. 


The table-based techniques, jdk-fSRP and jdk-tSRP, suf- 
fer from a substantial startup time, whereas jdk-MSA 


There is one warning noting that 8 files used deprecated APIs. 
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class A { } 


class B extends A { } 


class MDJDriver implements MultiDispatchable { 
String m(A al, A a2) { return ‘AA; } 
String m(A al, B b2) { return ‘AB‘; } 
String m(B bl, A a2) { return "BA"; } 
String m(B bl, B b2) { return "BB"; } 


static public void main(String args{]) { 
final int LOOPSIZE = 100000; 

A anA = new A(); 

A aB = new B(}; 

MDJDriver d = new MDJDriver(}; 

for( int i=0; i<LOOPSIZE; i++) { 
System.out.println(d.m(anA, anA)); 
System.out.println(d.m(anA, aB}); 
System.out.printin(d.m(aB, anA)); 
System.out.println({d.m(aB, aB)}; 





Figure 6: Simple Multi-Dispatch Testing Code 


primarily uses existing data structures found in the JVM 
interpreter and lazily computes any additional values. 
This reduces the cost of program startup. 


] femein see) sec. (a) | Norm. | Correct 
F26.40+068 0.07) | 100 | No 
Seas | Seam cata |— 110 | Wary 
jdk-1SRP 
[Lidk-tSRP_[ 29.48 084 O17 | 112 | __Yes | 


Table 3: Simple Multi-Dispatch 
(User+System Execution Time in seconds) 





| JVM 

















4.3 Double Dispatch of Events 


Our third experiment involves computing the perfor- 
mance differences between double dispatch and the two 
multi-dispatch implementations of the example given in 
Figure 2. We constructed a synthetic type hierarchy of 
AWTEvent classes, to match those in Figure 2. The dis- 
cussion of Swing follows in Section 4.5. We also con- 
structed three different component types: 


Double Dispatch (DD) implements double dispatch 
via type-cases and programmer-coded type num- 
bering as shown in Figure 2(a).!! 


Multi-Dispatch (MD) implements multi-dispatch as 
shown in Figure 2(b), where the type-cases from 
DD have been replaced with multi-dispatch. 


' Ty pe-cases are not the most effective double-dispatch technique, 
but this code matches Sun’s AWT implementation. For a comparison 
with other double-dispatch techniques, see (8. 13]. 
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Full Multi-Dispatch (FMD) eliminates the type-cases 
and the programmer-coded type-numbering from 
DD. It divides MouseEvent into two different 
classes and eliminates the switch statement. 


To avoid inlining effects, we added code for updating 
an instance variable to the body of each process- 
Event (AWTEvent). This experiment consists of dis- 
patching a total of one million events through process- 
Event (AWTEvent). Each event type appears equally 
often, as we iterate over an array containing equal num- 
bers of each event. We compute the loop overhead, sub- 
tract the overhead amount, and then divide the remaining 
time by the number of events dispatched. The timing re- 
sults are shown in Table 4. 


Also, we give an additional timing value for our cus- 
tom SRP implementation, where we disabled mutual ex- 
clusion in the dispatcher. Currently our implementation 
uses acostly monitor to ensure that no other thread is up- 
dating the dispatch tables during a multi-dispatch. High- 
performance concurrent-read exclusive-write protocols 
can eliminate this overhead; the nolock value represents 
this highest- performance case. 


As DD does not declare itself multi-dispatchable, the 
similarity of the results in column 2 of Table 4 again 
shows that our multi-dispatchable virtual machines do 
not significantly penalize uni-dispatch code. Further, 
we see that the cost of interpreting numerous expen- 
sive JVM bytecodes, such as instanceof, followed by 
another invokevirtual (which is DD’s strategy), is 
more costly than our multi-dispatch techniques. The full 
multi-dispatch implementation (FMD) is faster than the 
partial multi-dispatch (MD). This is reasonable because 
MD ends up double-dispatching two of every six events. 


Again, we see that the framework-based SRP technique 
suffers from considerable initial overhead. We hypothe- 
size that it is a result of the object-oriented nature of our 
implementation of the table-based techniques. In each 
dispatch, several C++ objects are created and destroyed 
on the heap. Our tuned SRP implementation, jdk-tSRP, 
removes this overhead and provides faster dispatch per- 
formance than programmer-coded double dispatch. 


OpenJIT compilation gains only minor improvements 
for the multi-dispatch system. This matches our ex- 
pectations since OpenJIT calls the same select Multi- 
Method () routine that the interpreter uses, there is only 
a slight benefit from avoiding some interpreter frame 
manipulations. 


4.4 Arity Effects 


Our final micro-benchmark explores the time penalties 
as the number of dispatchable arguments and applicable 
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Interpreter 
MD 
Time (a) 


Dispatch DD 
Time (a) 


JVM Time 
|_jdk 


[Ligk-MISA_| 







FMD 


a Se CO ee oe 








(a) Time (a) | Time 


| 

255 (0.04) | 243 (0.03) | 

|_idk-fSRP | | 3.12 (0.08) [2.52 (0.05) [096 (0.01) [2.90 (0.05) | 2.47 (0.05) | 
jdk- A 0.75 (0.03) | 0.72 (0.02) 0.95 (0.00) 0.74 (0.02) 0.71 (0.01) | 
nolock 0.95 (0.00) | 034 (000) | 032 (000) || 0.95 (0.00) 0.32 (0.00) | 


Table 4: Event Dispatch Comparison 
(Call-site Dispatch Times in microseconds) 


methods grow. To do this, we built a simple hierarchy 
of five classes (one root class A, with three subclasses 
B, C, and D, and finally class E as a subclass of c) and 
constructed methods of different arities against that hi- 
erarchy. We defined the following methods: 


e classes A, B, C, D, and E contain unary methods 
R.m() (where R represents the receiver argument 
class). 


e classes A, B, C, D, and E also implement five binary 
methods, R.m(X) where X can be any of A, B, C, D, 
or E. 


e classes A, B, C, D, and Eimplement 25 ternary meth- 
ods, R.m(X, Y) where X and ¥ can be any of A, B, 
C.D; Or Ex 


e classes A, B,C, D, and E implement 125 quaternary 
methods, R.m(X, Y, Z) where X, Y, and Z can be 
any of A, B, C, D, or E. 


MSA looks at one fewer dispatchable arguments than 
the table-based techniques because the receiver argu- 
ment has already been dispatched by the JVM. For in- 
stance, given a unary method, MSA makes no widen- 
ing conversions for dispatchable arguments. A binary 
method requires MSA to check only one widening con- 
version. The table-based techniques dispatch on all ar- 
guments and gain no benefit from the dispatch done by 
the JVM. 


We invoke one million methods for each arity. This 
means that each of the unary methods is executed 
200,000 times. However each of the quaternary methods 
is executed only 1,600 times. After computing the loop 
overhead via an empty loop, we determine the elapsed 
time to millisecond accuracy and determine the time 
taken for each dispatch. Our results are shown in Fig- 
ure 7. 


We can evaluate the arity effects in the uni-dispatch case 
by coding a third level of double dispatch. Already the 
overhead of constructing a third activation record ex- 
ceeds the dispatch time of our tuned SRP implementa- 


tion. Also, our SRP implementations suffer only lin- 
ear growth in time-penalties as arity increases, whereas 
MSA suffers quadratic effects. 


Arty Effects on Mutti-Dispatch 


Artty (including single receiver) 


Figure 7: Impact of Arity on Dispatch Latency 


4.5 Swing and AWT 


Our final test is to apply multi-dispatch to AWT and 
Swing applications. To do this, we needed to rewrite 
AWT and Swing to take advantage of multi-dispatch. 


We modified 11% (92 out of 846) of the classes in the 
AWT and Swing hierarchies. We eliminated 171 deci- 
sion points, but needed to insert 123 new methods to 
replace existing double-dispatch code sections. Within 
the modified classes, we removed 5% of the condition- 
als and reduced the average number of choice points per 
method from 3.8 to 2.0 per method. This reduction illus- 
trates the value of multi-dispatch in reducing code com- 
plexity. 


In all, 57 classes were added, all of them new event types 
to replace those previously recognized only by a special 
type id (as in the AWT examples described previously). 
Our multi-dispatch libraries are a drop-in replacement 
that executes a total of 7.7% fewer method invocations 
and gives virtually identical performance with applica- 
tions such as SwingSet. In our sample application, 
we found that the number of multi-dispatches executed 
almost exactly equaled the total reduction in method in- 
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Uni-Swing 
Methods 


Uni-Methods 


Multi-Swing 
Multi-methods 


160 (0.02%) || 





vocations. This suggests that every multi-dispatch re- 
placed a double dispatch in the original Swing and AWT 
libraries. 


We verified the operation of the entire unmodified 
SwingSet application with our replacement libraries. 
Finally to measure performance, we timed a simple 
Swing application that handles 200,000 AwTEvents of 
different types. The timing results are given in Table 6. 


Dispatch 
JVM Time (o) 


jdk 28.03 (0.35) 


| jdk-MSA | 28.69 (0.31) | 70.09 
| jdk-tSRP | 29.33 (0.42) | 28.30 


Table 6: Swing Application Execution Time 
(Event loop times in seconds) 














Uni-Swing Multi-Swing 


Time (a) 













H 
(0.15) 
(0.36) 





The Swing and AWT conversion also demonstrates the 
robustness of our approach. We needed to support multi- 
dispatch on instance and static methods. Nolock values 
are not given because Swing breaks our simplification 
that dispatch tables are not updated concurrently, and 
jdk-fSRP values are not given because the framework- 
based system does not support static methods. Swing 
and AWT expect to dispatch differently on Object and 
array types. Inmodif ying the libraries, we found numer- 
ous opportunities to apply multi-dispatch to private, pro- 
tected, and super method invocations. In addition, sev- 
eral multi-methods required the JVM to accept covariant 
return types from multi-methods. All of these features 
are required for a mainstream programming language. 


5 Miulti-Dispatch Issues 


Besides performance and correctness, multi-dispatch 
must contend with a number of serious difficulties which 
the javac compiler cannot recognize. They are: am- 
biguous method invocations caused by inheritance con- 
flicts, incompatible return type changes, masking of 
methods by primitive widening operations, and null ar- 
guments. Each of these is illustrated in Figure 8. We 
have developed a tool called MDLint that can identify 
these problems and warn the programmer. 


The first difficulty is that multi-dispatch, even in a 
single-inheritance language, can suffer from ambiguous 
methods. The two examples using the m1 methods illus- 
trate this. For the first method invocation, the compiler 


2.350.172 (1.1%) || 


Table 5: Swing Application Method Invocations 


knows that A.m1(B) and B.m1(A) arecandidates. Nei- 
ther one is more specific than the other, so the compiler 
aborts with anerror. We can fix that by statically typing 
the receiver argument to A, but multi-dispatch sees ex- 
actly the same conflict at runtime. Our MDLint program 
warns about the problem. If the programmer disregards 
the warning, our JVM detects the error and throws an 
AmbiguousMethodException. 


Throwing a runtime exception may seem neither elegant 
nor acceptable, but one of the key attributes of the JVM 
is to maintain security. A malicious programmer can 
separately compile each class so that errors are not evi- 
dent until execution. The JVM must protect itself from 
these possibilities, and throwing an exception is the only 
option. As we noted, our MDLint tool can recognize 
and report potential ambiguities, exception inconsisten- 
cies and return-type conflicts at compile time. 


The second difficulty centers around the fact that javac 
considers methods with different argument types as dis- 
tinct. This means that they can have different return 
types. Multi-dispatch forges additional connections be- 
tween classes based on the additional dispatchable argu- 
ments. This means that methods which javac consid- 
ered distinct are now overriding each other. In the exam- 
ple, we see that the two m2 (.. .) methods override each 
other for multi-dispatch. Our multi-dispatch implemen- 
tations throw an IllegalReturnTypeChange excep- 
tion, unless the more specific method returns a subtype 
of the original returned value. 


Another ramification of the fact that uni-dispatch Java 
considers different argument combinations as distinct 
methods is that javac does not ensure that the throws 
clauses are compatible. As with any overriding 
method, we would want a more specific multi-method to 
covariantly-specialize the set of exceptions. Our type- 
checker validates this, but, in compliance with the VM 
specification, our virtual machine neither checks nor re- 
ports this inconsistency. 


The third difficulty involves the use of literal null as an 
argument. If null is typed, as in the first invocation of 
m3(), then javac performs static multi-dispatch with 
that type. This restricts the set of applicable methods 
javac will consider. In our example, an ordinary JVM 
can avoid loading class c. The multi-dispatch JVM rec- 
ognizes that m3 (Cc) might apply (since a is dynamically 
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class A { 
void m1(B bl) {...} 
void m4(int i) {...}} 


class B extends A { 
void m1l{A al) {...} 
void m4(byte b) {...}} 


class C extends B {...} 


class MDJIssues { 

int m2(A al, A a2) {...} 
String m2(B bl, B b2) {...} 
void m3(A ai) {...} 

void m3(B b1) {...} 

void m3(C cl) {...} 


public static void main(String args[)) { 
A Ab = new B(); // static: A, dynamic: B 
B Bb = new B(); // static: 8B, dynamic: B 


// multi-dispatch difficulties 
Bb.ml(Bb); // javac: ambiguous method 
Ab.ml1(Bb); // javac: OK, MDJ: ambiguous 


// incompatible return type change 
int i = m2(Bb, Bb); // javac: bad return type 
int j = m2(Ab, Ab); // javac: OK, MDJ: exception 


// null arguments are more consistent 

Aa =nuil; 

m3(a); // regular Java: executes m3(A) 
/7/ MDI: loads C, executes m3(C) 

m3(null); // both execute m3(C) 


// stronger referential integrity 

m3(Ab); // regular Java: executes m3(A) 
// MDT: executes m3(B) 

m3 (new B()); //both execute m3/B) 


// primitive widening hides correct method 
byte b = 7; 
Ab.m4(b); // javac: widens, calls A.m4(int) 
// MDI: ignores B.m4 (byte), calls A.m4(int) 
Ab.m4(int(b)); // programmer widening 














Figure 8: Examples of Multi-Dispatch Issues 


of null type and null is subtype of class c). Therefore, 
multi-dispatch Java loads class c in order to determine 
its place in the type hierarchy, and decides that m3 (C) is 
the most-specific method. Literal nulls, as shown in the 
second invocation of m3 (), illustrate the inconsistency 
of standard Java; it now agrees with the multi-dispatch 
JVM that m3 (C) should be invoked. The ordinary JVM 
can still avoid loading class c, because javac has al- 
ready static multi-dispatched to m3(c) '?. Presumably, 
the argument is used in m3 (C), so the ordinary JVM will 
end up loading class Cc, just like the multi-dispatch JVM. 


The null argument problem is an example of a more gen- 
eral referential transparency problem in Java. Inconsis- 
tent invocations can occur when expressions are substi- 
tuted in place of variables. This is because javac might 
apply more precise type information from the substituted 
expression. As an example, compare the execution of 
the third and fourth invocations of m3(...). By replac- 


'2 There is a subtlety here because javac selects the most-specific 
method fromthe method dictionary of the static type of the receiver. 
Therefore, dynamic uni-dispatch still may not select the most-specific 
method of the receiver’s dynamic class. 


ing Ab with its value, we have altered the execution of a 
program. 


The last difficulty is more complex and, at this time, 
unsolved. The compiler selects a method based upon 
widening operations and may change the type of primi- 
tive arguments. In the example, the compiler inserts in- 
structions to convert b from a byte to an int. At run- 
time, we have lost all traces that b was originally spec- 
ified as a byte. Indeed, the programmer might have 
wanted to force that exact conversion; the bytecodes 
would be identical to compilcr-generated conversions. 


6 Implementation 


In this section, we describe how the JVM is extended to 
support dynamic multi-dispatch. We begin by examin- 
ing how to indicate to the JVM which classes are multi- 
dispatchable. We then examine where multi-dispatch 
must Occur and, finally, we review three different multi- 
dispatch implementations. 


6.1 Marking Multi-Dispatch Classes 


We tell the JVM that multi-dispatch is required on a 
class-by-class basis by implementing the empty inter- 
face MultiDispatchablie in each class that is multi- 
dispatchable. The Java programming language has al- 
ready leveraged this idea for marking class capabilities 
with the Cloneable interface. We use the Multi- 
Dispatchable interface to denote that any method sent 
to a multi-dispatch receiver should be handled by the 
multi-dispatcher. For efficiency, we add a flag to the 
internal class representation to indicate that a class is 
multi-dispatchable, rather than searching its list of inter- 
faces at each method invocation. The value of this flag 
is set once, at class load time. 


Our selection of MultiDispatchable as the marker 
fequires us to recognize multi-dispatch on a class-by- 
class basis, not on a method-by-method or argument- 
by-argument basis. That is, every method invocation 
where the uni-dispatch receiver is a member of a multi- 
dispatchable class goes through our multi-dispatcher. 
Furthermore, because interfaces are inherited, this ap- 
proach requires any subclass of a multi-dispatchable 
class to also be multi-dispatchable. Most importantly, 
any method invocation where the receiver argument 
is not marked for multi-dispatch continues unchanged 
through the uni-dispatcher. The benefit of this is that the 
syntax of Java programs is unchanged, and the perfor- 
mance and semantics of uni-dispatch remains intact. 


The techniques used to mark code as multi-dispatchable 
and to implement multi-dispatch method invocations 
are independent. MultiDispatchable marks entire 
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classes without language extensions, but our JVM ac- 
tually supports multi-dispatch on a method-by-method 
basis. An alternate tagging mechanism, that marked in- 
dividual methods as multi-dispatchable, may be possible 
if we permitted language extensions. 


6.2 Adding Multi-Dispatch 


As part of the uni-dispatch of an invoke bytecode, the 
JVM finds a method pointer from the array of methods 
in the receiver argument class. At this point, the in- 
terpreter loop is about to build a new frame to execute 
the found method. The interpreter loop (and classic VM 
JIT compilers) proceed to call a,special function, called 
the invoker that handles the details of building the 
new frame and starting the new method. The Research 
JVM uses different invokers for native, bytecode, syn- 
chronized, JIT-compiled, and other method types. Sim- 
ilar to the OpenJIT system [21], we replace this invoker 
function with a custom multi-invoker that computes the 
correct multi-dispatch method. Once the more precise 
method is known, we simply invoke it directly. 


The multi-invoker is installed at class-load time. The 
interpreter loop and invoker for uni-dispatch are un- 
changed. This supports our claim that uni-dispatch pro- 
grams and libraries suffer no execution time penalties. 


OpenJIT is supported in exactly the same way. Ev- 
ery method contains a compiledCode function pointer 
onto which OpenJIT installs its compiled method body. 
Once the compilation is complete, OpenJIT saves the 
compiled method body of any multi-method to a new 
field oldCompiledCode and installs a pointer to a rou- 
tine DispatchMul ti (). This replacement invoker sim- 
ply calls the same method specializer select Multi- 
Method () that the interpreter uses. If the more precise 
method-body is already compiled, then OpenJIT jumps 
into the oldCompiledCode, executing the more spe- 
cific compiled method. Alternately, if the more precise 
method is not already JIT-ed, then DispatchMulti () 
sets it to be compiled and invokes the interpreter on the 
bytecode version. 


Unfortunately, we must disable much of the inlining 
facility of OpenJIT when using multi-dispatch. The 
uni-dispatch OpenJIT compiler can inline private, 
static, and final methods because they can never 
change. With multi-dispatch, this is no longer true — at 
a given call-site, the selected multi-method may change 
depending on the arguments to the current invocation. 
The JIT compiler and VM must work together to en- 
sure that every method invocation is checked for multi- 
dispatch and correctly specialized. 


The core component of our system is the select- 


MultiMethod () routine, which locates a more-specific 
method applicable to a set of arguments. We have exper- 
imented with three different multi-dispatch techniques; 
they are examined in the following sections. For each 
technique, we also describe our solution for the imple- 
mentation issues described in section 5. 


6.3 Reference Implementation: MSA 


Our reference implementation is an extension of the 
Most Specific Applicable algorithm described in section 
15.11 of The Java Language Specification and in sec- 
tion 2.2 of this paper. In particular, we re-examine the 
steps described in section 2.2 in light of the dynamic ar- 
gument types being used. 


When the multi-invoker is called, it has access to the 
methodblock that has already been found by the uni- 
dispatch resolution mechanism. We also have the top of 
the operand stack, so we can peek at each of the argu- 
ments. Last, we have the actual receiver, which can pro- 
vide the list of methods (including inherited ones) that it 
implements. 


Every method is represented by a methodblock con- 
taining many useful pieces of information. First, it holds 
the name of the method. Second, it contains a handle 
to the class that contains this method!3. Third, it con- 
tains the signature which we can parse to get the arity 
and types of the dispatchable arguments. For perfor- 
mance, we parse the signature only once. We add two 
fields to the methodblock: int arity to cache the 
arity and ClassClass **argClass to hold the class 
handles for the dispatchable arguments. 


With these three pieces of information, we implement a 
dynamic version of the MSA algorithm directly. Wher- 
ever the original algorithm would use the static type 
of an argument, we apply the known dynamic type in- 
stead. In step 2(b) from section 2.2, the compiler would 
compare the static type of each argument with the cor- 
responding declared type for the candidate method. In 
the dynamic case, we have the arguments on the stack, 
so we can find their dynamic types. We compare each 
argument’s dynamic type against the declared type of 
the corresponding argument of the method. We dis- 
card any method that is not applicable due to access 
rights (private methods) or whose declared types do 
not match the arguments on the stack. The remaining 
methods are dynamically applicable. 


The issue of null-valued arguments becomes significant 
at this point. JLS chapter 4 recognizes the need for a 
null type to represent (untyped) null values. It further 


'3Recall that methods might be inherited; this class handle is the 
original implementing class. 
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declares in section 4.1 that the null type can be coerced 
to any non-primitive type. Also, section 5.1.4 allows null 
types to be widened to any object, array or interface type. 
Statically, this means that an (untyped) null argument 
can be widened to any class. In the dynamic case, we 
want to do the same. Therefore, whenever we encounter 
a null argument we accept the conversion of that null to 
a method argument of type class, array, or interface. 


Unfortunately, if we have a null argument, we may retain 
a method which accepts arguments of classes that are not 
yet loaded. We need to force these classes to be loaded 
to ensure that the next step operates correctly. 


Given the list of applicable methods, step 2(d) finds the 
unique most specific method. Again the operation is 
identical to the process that the javac compiler fol- 
lows. One applicable method is tentatively selected as 
the most specific. Each other applicable method is tested 
by comparing argument by argument (including the re- 
ceiver argument) against the tentatively most specific. 
At each step, we discard any methods that are less spe- 
cific. We continue this process until only one candi- 
date method remains, or two or more equally specific 
methods remain. In the latter case, we have an ambigu- 
ous method invocation and we throw an Ambiguous- 
MethodException to advertise this fact. 


Next, we verify that the return type for our more spe- 
cific method is compatible with the compiler-selected 
method. This check relaxes JLS 8.4.6.3, where we must 
Teject any invocation that has a different return type, 
yet ensures type-safety. If the return type is different, 
we throwan I1legalReturnTypeChange exception at 
runtime. 


6.4 Table-based Dispatch 


Our SRP framework-based techniques is taken from the 
Dispatch Table Framework (DTF) [22]. This is a toolkit 
of many different uni-dispatch and multi-dispatch tech- 
niques. In order to call the DTF to dispatch a call-site, 
we need to inform the DTF of the various classes and 
methods present in our Java program. Our interface con- 
sists of a number of straight-forward routines to perform 
this registration. 


The JVM maintains in-memory structures for each 
loaded .class file. We have extended that Class- 
Class Structure to contain a DTF_Type field. It contains 
a pointer to the C++ object generated by the DTF. Once 
a class is dynamically loaded by the JVM, we check 
to see if we must register it with the dispatcher. If the 
dispatcher has already been instantiated, we register the 
class via javaAddClass(...) and store away the re- 
turned DTF_Type pointer. 


If a dispatcher has not been instantiated, and the just- 
loaded class is uni-dispatch only, we defer the regis- 
tration in order to reduce the overhead to uni-dispatch 
programs. If the just-loaded class is marked for multi- 
dispatch and the dispatcher has not been instantiated, the 
process is more complex. First, we instantiate a new dis- 
patcher. Then, we register each class that has already 
been loaded, ensuring that its superclasses and superin- 
terfaces are registered first. 


Finally, as the last part of registering a class with the 
dispatcher, we need to see whether any methods from 
other classes were held in abeyance until this class was 
loaded. This can occur if the methods from other classes 
expect dispatchable arguments of the class we are just 
now loading. As we shall see below, we deferred regis- 
tering these methods until the class was loaded. 


Java’s facility for dynamically reloading classes forces 
us to ensure that two classes with the same name are 
assigned different DTF_Types. Java ensures that two 
classes with the same name are treated as distinct by 
insisting that each one is loaded by a different class- 
loader [19]. We apply the same technique by supply- 
ing the DTF framework with a name consisting of the 
classloader name, followed by “: :” and followed by the 
class name. They system classloader is given the empty 
name “”’. 


For a class marked for multi-dispatch, we need to reg- 
ister its methods along with their types, via java- 
AddMethod(...). If this class implements Multi- 
Dispatchable directly, then we register all of its meth- 
ods, including inherited ones. Alternately, if Multi- 
Dispatchable is an inherited interface for this class, 
then we know that its superclass has already registered 
its methods. Therefore, we do not need to register them; 
we only need to register the methods that we directly 
implement. 


This method registration process is complicated by our 
desire to load classes lazily. If a method accepts an argu- 
ment with a class not yet seen by the JVM, we know that 
we could never dispatch to it until that class is loaded!¢. 
We set that method aside for future registration. 


If all of the argument types for the method are al- 
ready registered with the DTF, then we proceed to reg- 
ister the method. We provide a methodblock pointer 
that we want the framework to return if this method 
is the dispatched target. We bundle up the DTF_Type 
values found in the ClassClass structures for each 
argument class (including the receiver argument) and 
pass them to the framework. The framework returns a 


14s mentioned above, our DTF-based systems do not permit null 
as a dispatchable argument. Therefore, this guarantee holds. 
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DTF..Behavior pointer that we store in the method- 
block. 


Dispatch becomes a very simple operation. We build 
an array of the DTF_Type pointers from the arguments 
on the Java stack. If we encounter a null argument, 
we throw a NullPointerException. The DTF.Type 
array, along with the DTF.Behavior pointer from the 
compiler-selected method allow the framework to locate 
the methodblock pointer that we had previously regis- 
tered. 


We expect that the returned methodblock pointer is 
the method for multi-dispatch. We validate it against 
the compiler-selected method. If the return type has 
changed, we abort the dispatch and throw an Illegal- 
ReturnTypeChange exception. Other wise, we call the 
found method's original invoker and return its value as 
the result of the interpreter’s call to a method invoker. 


Single Receiver Projections Single Receiver Projec- 
tions (SRP) [16] is a technique that considers a multi- 
dispatch as a request for the joint most specific method 
available on each argument. For a given argument posi 
tion and type, an ordered (most-specific to least-specific) 
vector of potential methods is maintained. The vectors 
for all the argument positions are intersected to provide 
an ordered vector of all applicable methods. Because of 
the ordering, this vector can be quickly searched for the 
most applicable method. 


SRP uses a uni-dispatch technique to maintain the 
vector of potential methods for each individual argu- 
ment. These vectors are typically compressed to con- 
serve space. Many different compression techniques are 
known: row displacement, selector coloring [2], and 
compressed selector table indexing [25]. Our imple- 
mentation uses selector coloring, because timing exper- 
iments [17] indicates that technique provides the fastest 
dispatch times. 


7 Future Work 


Our MSA and tuned SRP dispatchers are the most com- 
plete. They support null as a dispatchable argument. 
multi-dispatch on other invoke bytecodes!>, widening 
of primitive dispatchable arguments, and multi-threaded 
dispatch. Our table-framework-based dispatchers do not 
currently support all of these facilities. Adding them 
would provide additional flexibility and allow them to 
fully support the Java programming language semantics. 
In particular, we have a two-table design that will allow 
one thread to dispatch through an existing table, while 
we register additional methods and/or classes to a new 


'SSignaled by implementing the empty interfaces StaticMulti- 
Dispatchable and SpecialMultiDispatchable. 
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Our custom SRP code implements multi-dispatch as a 
critical section, protected by a mutual-exclusion lock. 
We have devised, but not as yet implemented, a tech- 
nique which would eliminate the lock overhead (appro x- 
imately 0.38 jes for every multi-dispatch) and allow con- 
current multi-dispatch. The trade-off is that every thread 
would need to halt while the multi-dispatch tables are 
being updated. 


The OpenJIT support for multi-dispatch is still primi- 
tive; in particular, we eliminate all inlining actions. This 
is a conservative approach and one can identify situa- 
tions where inlining in multi-dispatch Java would pro- 
vide correct results. Identifying these opportunities will 
yield higher overall performance. 


Other multi-dispatch techniques exist, including com- 
pressed n-dimensional tables [1, 12], look-up au- 
tomata [9, 10], and efficient multiple and predicate dis- 
patch [7]. A comprehensive exploration of these tech- 
niques using Java is incomplete at this time. 


Another significant im provement for multi-dispatch is to 
incorporate our code testing tool into the javac com- 
piler. At this time, MDLint exists as a separate ex- 
ecutable which will recognize and warn the program- 
mer about common ambiguities and difficulties. It ana- 
lyzes a complete application and identifies the code sec- 
tions where the programmer could invoke an ambiguous 
method, or have a conflicting return type. 


Our reference implementation, MSA, supports multi 
dispatch on all method types (instance, static, in- 
terface, private, etc.), except constructors. Because 
the same bytecode is used to invoke a constructor in the 
superclass and a constructor with different arguments, 
we cannot distinguish the two possibilities. This issue 
is a specific instance of the need to apply a super to 
an argument other than the receiver. Fortunately, in our 
experience, this requirement does not arise in common 
pro gramming practice (except for constructors). 


Our tuned SRP implementation allows our dispatch 
tables to identify only those types that are multi- 
dispatched. This /azy type munbering is reversible, al- 
lowing the tables to shrink as classes are unloaded. 
In turn, multi-methods can revert to lower arity multi- 
dispatch (or even uni-dispatch). We see great promise in 
this technique for long-lived Java server applications. 


The DTF framework contains another dispatcher, Mud- 
tiple Row Displacement [22] (MRD) that operates 15% 
faster than SRP. Therefore, we expect that dispatch could 
be enhanced to provide even lower latency by applying 
this technique. Unfortunately, MRD currently does not 
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support incremental dispatch table updates in the same 
way that SRP does. In a dynamic environment such as 
Java, incremental updating of dispatch tables is desir- 
able. Enhancing MRD to support incremental updates is 
another research priority. 


Last, our marker interface MultiDispatchable de- 
notes that each method in a given class is to be multi- 
dispatched. Our JVM relies on this tag only to inform 
it about which methods are eligible for multi-dispatch. 
Therefore, without changing our multi-dispatch imple- 
mentation, alternate Java syntax would allow us to se- 
lectively mark individual methods (and their overriding 
multi-methods) as multi-dispatchable, rather than entire 
classes. We would like to explore the space of conserva- 
tive language extensions to expose this feature. 


8 Related Work 


Others have attempted to add multi-dispatch to 
Java through language preprocessors. Boyland and 
Castagna [3] provide an additional keyword parasite to 
mark methods which should have multi-dispatch proper- 
ties. They effectively translate these methods into equiv- 
alent double-dispatch Java code. By translating directly 
into compiled code, they apply a textual priority to avoid 
the thorny issue of ambiguous methods. Unfortunately, 
the parasitic method selection process is a sequence of 
several dispatches to search over a potentially exponen- 
tial tree of overriding methods. 


The language extension and preprocessor approach has 
other limitations. First, existing tools do not support 
the extensions; for example, debuggers do not elide the 
automatically generated double-dispatch routines. Sec- 
ond, instance methods appear to only take arguments 
that are objects, which is too limiting. Our experience 
with Swing shows that existing programs often dou- 
ble dispatch on literal null and array arguments and 
Pass primitive types as arguments; multi-methods need 
to support these non-object types. Third, preprocessors 
limit code reuse and extensibility; adding multi-methods 
to an existing behaviour requires either access to the 
original source code or additional double-dispatch lay- 
ers. 


Chatterton [8] examines two different multi-dispatch 
techniques in mainstream languages: C++ and Java. 
First, he considers providing a specialized dispatcher 
class. Each class that participates as a method receiver 
must register itself with the dispatcher. To relieve the 
programmer of this repetitive coding process, he pro- 
vides a preprocessor that rewrites the Java source to in- 
clude the appropriate calls. Each method, marked with 
the keyword multi, is also expanded by the preprocessor 
into many individual methods, one for each combina- 


tion of classes (and superclasses). A method invocation 
is replaced by a call to the dispatcher which searches via 
reflection for an exact match. That method is then in- 
voked. This system suffers from exponential blowup of 
methods. 


Chatterton’s second approach examines the performance 
of various double dispatch enhancements. He pro- 
vides a modified C++ preprocessor which analyses the 
entire Java program. It can build a number of dif- 
ferent double-dispatch structures, including cascaded 
and nested if...else-if...else statements, inline 
switch statements, and simple two-dimensional tables. 
Again, he expands every possible argument-type com- 
bination in order to apply fast equality tests rather than 
slow subtype checks. A significant restriction is that full- 
program analysis is required. This defeats the ability 
to use existing libraries and diminishes Java’s dynamic 
class loading benefits. 


One interesting language for multi-dispatch is Leavens 
and Millstein’s Tuple [18]. They describe a language 
“similar in spirit toC++ and Java” that permits the pro- 
grammer to specify at each call-site the individual argu- 
ments that will be considered for multi-dispatch. This 
paper does not describe an implementation; it appears to 
be a model of potential syntax and semantics only. A 
future project might be to implement his syntax specif- 
ically into the Java environment. In particular, a sim- 
ple syntax extension would allow super method invo- 
cations on arbitrary multi-dispatch arguments. 


Another recent development is MultiJava [11]. There, 
the authors extend the Java language with additional 
syntax to support open classes and multi-dispatch. 
The MultiJava compiler emits double-dispatch type-case 
bytecodes for invocations of the open-class methods and 
multi-methods. The emitted bytecode is accepted by 
standard JVMs, but suffers a substantial overhead from 
interpreting slow subtype-testing bytecodes. Unfortu- 
nately, multi-dispatch can only apply to methods defined 
using the open-class syntax and only within the program 
text that imports the open-class definitions. If subclasses 
wish to further specialize the multi-methods, additional 
open-class definitions are required. Compilation of these 
further open-subclasses may result in multiple layers of 
type-case double-dispatch. Internally, MultiJava inlines 
the multi-method bodies into a static method in a sep- 
arate anchor class — this means that the multi-methods 
disappear from the binary code and become invisible to 
the reflective subsystem in Java. Finally, MultiJava is a 
paper design at this time!®, so performance comparisons 
are not possible. 


16 Personal communication at OOPSLA 2000. 





USENIX Association 


6th USENIX Conference on Object-Oriented Technologies and Systems 


91 


92 


9 Concluding Remarks 


We have presented the design and implementation of 
an extended Java Virtual Machine that supports multi- 
dispatch. This is the first published description of how 
to implement arbitrary-arity multi-dispatch in Java. In 
contrast to the more verbose and error-prone double- 
dispatch technique, currently found in the AWT (Fig- 
ure 2), multi-dispatch typically reduces the amount of 
programmer-written code and generally improves the 
readability and level of abstraction of the code. 


Our approach preserves both the performance and se- 
mantics of the existing dynamic uni-dispatch in Java 
while allowing the programmer to select dynamic multi- 
dispatch on a class-by-class basis without any language 
or compiler extensions. The changes to the JVM it- 
self are small and highly-localized. Existing Java com- 
pilers, libraries, and programs are not affected by our 
JVM modifications and the programs can achieve per- 
formance comparable to the original JVM (Table 2). 


In a series of micro-benchmarks, we showed that our 
prototype implementation adds no performance over- 
head to dispatch if only uni-dispatch is used (Table 2) 
and the overhead of multi-dispatch can be competitive 
with explicit double dispatch (Table 4). 


We have also introduced and implemented an extension 
of the Java Most Specific Applicable (MSA) static multi- 
dispatch algorithm for dynamic multi-dispatch. In ad- 
dition, we have performed the first head-to-head com- 
parison of table-based multi-dispatch techniques imple- 
mented in a mainstream language. In particular, we im- 
plemented Single Receiver Projections (SRP). Overall, 
our tuned SRP implementation performs as well (or bet- 
ter) than programmer-targeted multi-dispatch. With per- 
formance improvements in concurrency, we expect our 
tuned system to out-perform type-case double dispatch. 
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Abstract 


Object oriented languages generally include some 
form of dynamic dispatch; that is, in the absence 
of precise compile-time type information, they per- 
form a run-time selection of the appropriate func- 
tion body (or method) from a set of candidates. 
Existing single-dis patch languages restrict dynamic 
dispatch to the object receiving the message. 


Such languages exhibit a conflict between the goals 
of providing an extensible a set of types and provid- 
ing an extensible the set of operations that can be 
performed on these types. We show that this con- 
flict is a consequence of the restriction of dynamic 
dispatch to the receiver object. We also demon- 
strate that this conflict can be resolved by introduc- 
ing a generalized form of single dispatch (thus avoid- 
ing the complexity of multiple dispatch). On this 
evidence, we argue that dispatch technique should 
be decoupled from membership in a class and access 
to its representation. 


1 Introduction 


Inheritance helps a programmer create a set of 
classes for distinct yet similar types of objects. 
When inheritance is used for subtyping [13], the su- 
perclass lists the messages that must be handled by 
its subclasses. Code that uses superclass messages 
will work for subclass objects, as long as these ob- 
jects respond properly to all messages listed in the 
superclass. Such code will also work without modi- 
fication (or even recompilation) with objects of s ub- 
classes that are added to the system later. 


This ability to reuse code as new kinds of objects are 
added to the system is touted as one of the major 
advantages of the object-oriented approach. How- 
ever, it comes at the expense of our ability to reuse 
code as new operations are added to a system. Con- 
sider the general problem of designing software in 
which various interpretations (i.e. methods or func- 
tions) are defined for various kinds of objects (i.e. 
classes), as discussed by Harrison and Ossher [10], 
Krishnamurthi, Felleisen, and Friedman [11], and 
Appel [2, Section 4.2]. For example, Appel focuses 
on the design of an abstract syntax tree (A.S.T.) 
for a compiler: The different kinds of objects corre- 
spond to various program structures including ex- 
pressions, statements, and declarations, while the 
different interpretations include static checks (such 
as type checking), various optimizations, or code 
generation for various architectures. 


Such a system can be designed in the traditional 
object-oriented style, which lets us add new kinds 
of objects. However, new interpretations must be 
added to the superclass, requiring modification of 
existing source code and recompilation. The need 
to add operations to the class also violates a ba- 
sic principle of data encapsulation, that each class 
should be defined with a minimal set of operations 
and edited only for a redesign, not a reuse. 


We could, of course, abandon the object-oriented 
style, and adopt a style in which each function con- 
tains code for every type of object it could operate 
on. This lets us add new interpretations, but the in- 
troduction of a new kind of object forces us to edit 
each of the existing functions. We have once again 
prevented encapsulation and reuse without recom- 
pilation, and lost other benefits of object-oriented 
style as well. Appel argues that the latter style 
is more appropriate for his A.S.T. example, while 
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the former is more appropriate for classic object- 
oriented systems (such as graphical user interfaces). 


We demonstrate that this choice of styles represents 
a limitation of traditional object-oriented languages, 
not a fundamental design choice. Specifically, it is a 
consequence of the restriction of dynamic dispatch 
to the methods listed in a class. If we relax this 
restriction, we can create a system in which exist- 
ing code can be reused (without access to source 
code) as both new kinds of objects and new inter- 
pretations are added. We show that it is possible 
to allow dynamic dispatch for functions not listed 
in a class (which we call accessory functions of the 
class), and demonstrate that accessory functions can 
be implemented efficiently. 


This paper is organized as follows: We begin, in 
Section 2, with a brief review of the use of dynamic 
dispatch and its impact (both positive and negative) 
on code reuse. This section also covers the imple- 
mentation of dynamic dispatch in C++. In Section 
3, we spell out the goals that we intend to achieve 
by generalizing dynamic dispatch, describe the se- 
mantics (and C++ syntax) for accessory functions, 
and show that accessory functions can be used to 
enhance reuse in our example program. We then 
discuss, in Section 4, the relationship of accessory 
functions to other language properties such as sup- 
port for data encapsulation. In Section 5, we briefly 
discuss the implementation of accessory functions in 
C++. Finally, we discuss related work in Section 6, 
and give our conclusions in Section 7. 


2 Dynamic Dispatch and Reuse 


One argument made by advocates of object-oriented 
languages is that object-oriented programming can 
facilitate code reuse. A well-designed class or hier- 
archy of classes can be reused without knowledge 
of, or access to, its implementation (just as a well- 
designed function or procedure can be reused in 
other languages). In this paper, we will be con- 
cemed with two types of reuse of classes. In the 
first, which we call reuse by inheritance, a program- 
mer represents a new kind of data by making an 
extension of some existing data type. In the sec- 
ond, which we call reuse in a function, a programmer 
uses a class in the implementation of a new function 
(perhaps for a local variable or parameter). 


We will focus on reuse that can be accomplished 
without modification of the source code that is to 
be reused. This is obviously important if software 
is distributed without source code, or if program- 
mers are not able to modify the source code. It 
also prevents unnecessary code management com- 
plexities when source code is available. If several 
groups of programmers each reuse the samc class, 
their modifications to that class must be merged, 
and the merged code must be tested, if these exten- 
sions are ever to be used together. 


Note that reuse by inheritance (as defined above) 
can occur even in languages that do not support in- 
heritance directly. Consider for example the task of 
representing various kinds of expression nodes in a 
compiler’s abstract syntax tree (A.S.T.). (This ex- 
ample was adapted from [2]; we have focused on a 
simple method that can be understood with minimal 
knowledge of compiler construction.) In languages 
like C, the programmer can use a single struct to 
represent all kinds of expression nodes, distinguish- 
ing among the different kinds of nodes by including 
a kind field in the struct and using a switch state 
ment to select code that is appropriate for each kind 
of expression node. Reuse by inheritance occurs 
if the programmer adds a new kind of expression 
node (perhaps because a new kind of expression has 
been added to the language). However, this reuse 
requires access to (and modification of) the exist- 
ing code — the programmer must add a new case to 
the switch statements in existing functions, even if 
existing cases do not need to be modified. 


In an object-oriented language like C++, the pro- 
grammer can define the original kinds of expression 
nodes with a collection of classes, each of which in- 
herits from an “abstract superclass” Exp (shown in 
C++ in Figure 1). The abstract superclass gives 
methods that are shared by all kinds of expressions, 
such as a print_rep method to produce a string 
that gives a printable representation for any kind of 
expression node. These methods must be defined for 
every subclass of Exp, and we can therefore request 
the printable representation of any object denoted 
by a reference of type Exp & (which must be of a 
class derived from Exp). Methods that are specific 
to one kind of node are defined only in the appropri- 
ate subclass (and thus cannot be requested through 
references of type Exp & in statically checked lan- 
guages like C++). 


We rely on the fact that we can request the 
print.rep for any object referred to by an Exp &in 
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class Exp { // abstract superclass 
public: 

virtual string print_rep() = 0; 
private: 


33 


class Num : 
public: 

int value(); 

virtual string print_rep(); 
private: 


public Exp { 


a 


string Num: :print_rep() 
{ 
// convert integer 
// to ASCII string 
return itoa(value()); 


dh 


class Plus : 
public: 
Exp &lhs(); 
Exp &rhs() ; 
virtual string print_rep(); 
private: 


public Exp { 


Tie 


string Plus: :print_rep() 
{ 
return 
*€(?? + lhs().print_rep() + 
“42> + rhs().print_rep() + ‘*)??; 


Figure 1: Dynamic Dispatch Example 


the print.rep method for class Plus. This method 
uses the printable representation for the left and 
right operands of the sum. Dynamic dispatch en- 
sures that the print.rep method for the correct 
class is used, even though the compiler cannot deter- 
mine statically which method will be chosen. This 
approach lets the programmer add new kinds of 
nodes by deriving a new class (with an appropri- 
ate print.rep) for each new kind of node. This 
does not require any modification of existing source 
code, and dynamic dispatch will ensure that the new 
class’s print .rep is used for the new nodes, even in 
existing code (such as the print_rep method for 
class Plus). 


Dynamic dispatch shifts the responsibility of select- 
ing the appropriate print.rep code from the pro- 
grammer to the programming language. In single- 
dispatch languages like C++ and Java, dynamic dis- 
patch can be implemented by associating, with each 
object, a table of pointers to the code for each of 
the object’s methods. In our example, each object 
of a class that inherits from Exp has a pointer to its 
print.rep method at a fixed offset in its dispatch 
table - to perform a call to print.rep when the type 
of ob ject is not known, the compiler can generate an 
indirect function call using the table. 


Note that it is possible for the programmer to imple- 
ment dynamic dispatch in non-object-oriented lan- 
guages that allow pointers to functions, such as C. 
However, this requires that the programmer under- 
take the tedious and potentially error-prone task of 
initializing using the tables of function pointers. 


Unfortunately, the use of inheritance and dynamic 
dispatch inhibits reuse in a function. This is a con- 
sequence of the fact that only methods listed in a 
class can be dynamically dispatched based on that 
class. Consider what would happen if we wish to 
add a new pass to our compiler (such as an opera- 
tion to interpret an expression), rather than a new 
type of expression node. If we had used a single 
class with a kind field, we could simply create a new 
function that takes an expression node, checks the 
kind field, and interprets the node in the appropri- 
ate way. However, if we wish to add this operation 
to the collection of classes in Figure 1, me must edit 
the class definitions to add an interpret method. 


Editing the existing class definitions would be 
appropriate if we were redesigning, rather than 
reusing, these classes. However, we do not believe 
that every new use that requires dynamic dispatch 
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should be considered a redesign: If this were the 
case, the author of a class would have the respon- 
sibility of enumerating all cases in which dynamic 
dispatch is needed for objects of that type. 


There are a number of other ways to add an 
interpret method, each of which we consider 
unsatisfactory. First, we could introduce an 
interpret function that is not part of the Exp 
classes (in Java, it must be a method of some other 
class, such as a class Interpreter). This function 
could use typeid (in C++) or instanceof (in Java) 
in a series of if statements to select the appropri- 
ate code for the kind of node being interpreted. In a 
language without an equivalent of typeid, the pro- 
grammer can add a kind field (or operation). In 
either case, this approach simply sets up a future 
problem with reuse by inheritance - any program- 
mer adding a new kind of expression node must edit 
the code for interpret to work with the new type. 


We could also use inheritance to produce new sub- 
classes that add an interpret operation (deriving 
a class Exp_-with_interpret from class Exp). How- 
ever, this introduces spurious uses of multiple inheri- 
tance, which we consider highly undesirable (though 
we have no problem with legitimate uses of multiple 
inheritance): If we are to apply interpret to each 
new kind of node through an Exp reference, then 
the new node classes must share a common super- 
class with an interpret function, which introduces 
a second superclass for the new node classes. 


Thus, if dynamic dispatch is provided only to func- 
tions listed in the class, we are forced to choose 
between allowing reuse in functions (if we use an 
explicit switch) or reuse by inheritance (if we use 
dynamically dispatched methods). To allow both 
kinds of reuse, we must allow dispatch for functions 
outside of the class. This is allowed in some lan- 
guages that provide multiple dispatch [3, 9]. How- 
ever, multiple dispatch has a higher run-time cost 
than single dispatch: techniques based on complete 
dispatch tables may require large tables, and other 
methods do not provide constant-time dispatch [1]. 
Accessory functions provide both kinds of reuse 
without the added complexity and cost of multiple 
dispatch. 


Multiple dispatch can also be introduced as a pro- 
gramming technique rather than a language feature 
(for example, by using the visitor design pattern). 
This also introduces unnecessary overhead, and is 
less flexible than a compiler-generated dispatch, as 


6th USENIX Conference on Object-Oriented Technologies and Systems 


// "pure virtual" accessory function 
// for superclass 
int interpret(virtual Exp &) = 0; 


int interpret(virtual Num &n) 


il 

return n.value() ; 
} 
int interpret(virtual Plus &) 
{ 

return interpret (p.lhs()) 

+ interpret(p.rhs()); 

} 


Figure 2: Accessory Function Example 


we will see in Section 6. 


3 Accessory Functions 


Although no current single-dispatch language does 
so, it is in principle possible to allow dynamic dis- 
patch on a parameter other than the receiver of the 
message. We call a function that does so an acces- 
sory function of the class involved in dispatch. Fig- 
ure 2 shows how accessory functions can be used to 
add a dynamically dispatched interpret function 
to our A.S.T. example of Figure 1 (using a nota- 
tion based on C++). The rest of this section gives 
our design goals, and gives possible syntax and se- 
mantics for integrating accessory functions into a 
C++ -like language. 


Our goal is to provide the following properties of 
programs written with accessory functions: 


e Accessory functions can be added to a group 
of classes without editing (or even reading) the 
source for the classes. To avoid violated the 
principle of data encapsulation, accessory func- 
tions do not have access to the private data of 
any classes they are not listed in. For exam- 
ple, the interpret functions of Figure 2 do not 
have access to the private data of any class, in- 
cluding the classes for the abstract syntax tree. 


e New classes can be added to a program that 
uses accessory functions without any need to 
edit existing functions or classes. In other 
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words, we wish to allow both reuse in a function 
(the previous item) and reuse by inheritance. 


e Accessory functions can be dispatched as effi- 
ciently as other single dispatch functions, such 
as virtual functions in C++. 


e Except for the change in which argument is 
used for dynamic dispatch, function dispatch 
should follow the rules that exist in the lan- 


guage. 


e The system must be able to produce errors 
about dispatch before the program is executed: 
A user must not see “method not found” errors 
while running a program (this was a design goal 
of C++). 


3.1 Syntax 


We need syntactic mechanisms to identify the pa- 
rameter to be used in dynamic dispatch and to spec- 
ify that a superclass function should be selected dur- 
ing a call in a subclass function. In this article, we 
give a syntax that is an extension of C++, and fo- 
cus on definitions that are appropriate for C++, 
though accessory functions could be added to other 
statically typed single-dispatch object-oriented lan- 
guages. 


We identify an accessory function by using the key- 
word virtualin the declaration of a parameter. We 
consider virtual to be an attribute of a parame- 
ter rather than an attribute of the function itself. 
When virtual is used in the traditional way, we 
say that the member function has a virtual receiver 
(rather than a virtual parameter). Accessory func- 
tions for C++ may be created outside of any class, 
as in Figure 2, or they may be created as members 
(or friends) of one (or more) classes. 


When an accessory function for a subclass needs to 
make use of the superclass function, it gives explicit 
type information for the virtual parameter. We use 
syntax that is similar to type casting for this pur- 
pose (we chose this notation because it produces 
the result that type casting of a reference produces 
for a Statically dispatched function). To avoid in- 
troducing a new keyword, we reuse the word “vir- 
tual” for this purpose, i.e. the interpret func- 
tion for Num could call the superclass function (were 
it not pure virtual) with the syntax interpret ( 
(virtual Exp) n ). This is only legal if the new 


type is a public superclass of the argument type; its 
effect is analogous to using interpret( (Exp &) n 
) for a statically dispatched function. 


3.2. Restrictions 


We place several restrictions on the definition of ac- 
cessory functions. Most are needed to prevent am- 
biguities that prevent us from selecting between dy- 
namic and static dispatch at compile time. 


e For any function, at most one parameter (in- 
cluding the receiver object) may be virtual. 
This is necessary to ensure that we do not need 
multiple dispatch. Here and in the remainder 
of this paper, we count the receiver object of a 
method as a parameter. 


e No single scope can contain two functions that 
differ only in the dispatch mechanism of a 
parameter: We cannot have f(Exp &) and 
f(virtual Exp &). This restriction is neces- 
sary because it would not be possible to distin- 
guish calls to the two functions. C++ has an 
analogous rule for virtual functions. 


e In any one scope, no two functions with the 
same name and arity (number of arguments) 
are dynamically dispatched on different param- 
eters. This will play an important role in our 
function selection semantics below. 


e All functions with parameters of class C must 
be defined before the execution of code that cre- 
ates an object of class C. In traditional C++ 
environments, all functions are defined before 
program execution begins, and this restriction 
always holds. In environments that allow dy- 
namic loading of classes (such as Java) this 
places restrictions on the relative timing of ob- 
ject creation and the loading of functions. 


e To ease implementation in C++, we only allow 
accessory functions for classes that already have 
at least one virtual function: For example, we 
cannot have f(virtual int &), as int has no 
dispatch table. 


3.3. Function Selection Semantics 


Function dispatch based on the types of multiple 
arguments, whether static or dynamic, raises two 
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challenges: We must specify which function body is 
considered the correct choice for any given call, and 
we must provide a way for the program to branch 
to this code efficiently. In this section, we consider 
the question of how to adapt the existing dispatch 
rules of C++ for accessory functions, leaving the 
the question of how to branch to this function for 
Section 5. 


The traditional choice of static vs. dynamic dis- 
patch, and the new decision of which argument is 
to be used for dynamic dispatch, must be made at 
compile-time. These decisions are thus based on the 
types of the references used in the call (rather than 
the types of the ob jects they refer to), and the set of 
functions that are in scope at the point of the call. 
Once the compiler has selected dynamic dispatch on 
a particular object, the true “run-time” type of the 
object will be used in the actual call. 


We ensure that we can statically determine which 
argument is to be used in dynamic dispatch by re- 
quiring that, in any one scope, no two functions with 
the same name and arity (number of arguments) 
are dynamically dispatched on different parame- 
ters. Essentially, we consider dispatch mechanism 
to be an attribute of the message (function name) 
rather than method (function body). Conflicts that 
might arise when two independent projects happen 
to use the same function names must be resolved 
via namespaces. 


This restriction allows us to use the traditional C++ 
approach to dispatch: We select, from the set of 
functions that are in scope, the one with parame- 
ters types that best match the compile-time type 
information about the arguments used in the call. 
If there is no unique best match, we generate an 
error message. We then generate either a dynamic 
dispatch (based on the appropriate parameter type, 
if one parameter is virtual) or static dispatch (if no 
parameter is virtual). 


Thus, if a group of functions of a given name and 
arity are dispatched on argument a, we produce a 
branch to the function that would have been called 
if all functions with this name and arity had been 
written as (possibly virtual) member functions of 
the classes of their a** arguments. In other words, 
we generate a branch to the function that would 
have been called if we had violated the encapsula- 
tion of the classes. 


As we will see in Section 5, our implementation al- 


lows us to produce warnings for certain surprising 
behavior that is a consequence of this combination 
of static and dynamic information. 


3.4 Type Casting 


The compiler will not produce a virtual argument by 
applying an implicit type cast to a value (though 
it may still convert a subclass type reference (or 
pointer) to a superclass reference (or pointer)). This 
is an extension of the existing C++ rule that the 
compiler will not apply an implicit cast to produce 
the receiver object. Virtual is generally used in the 
declaration of a pointer or reference type parameter: 
When applied to the declaration of a value param- 
eter, it affects type casting, but not dispatch (since 
complete type information must be present at com- 
pile time). 


The lack of casting for virtual arguments means that 
adding virtual to a parameter of an existing func- 
tion may interfere with the compilation of code that 
had used this function: It may be necessary to add 
an explicit cast where an implicit one had been used 
previously to produce the (non-virtual) argument. 
We believe it would be possible to implement a sys- 
tem that allows implicit casting for accessory func- 
tions, but that such a system could produce highly 
confusing results, as casting is based on the func- 
tions that are in scope, but dispatch is based on all 
compatible functions in the final program. 


3.5 Default Arguments 


The above discussion ignores the issue of default ar- 
guments in C++. We believe these can be handled 
by treating a declaration with a default argument 
as if it were a group of declarations of overloaded 
functions, all but one of which simply supply extra 
arguments and call the original function. 


4 Encapsulation 


Existing single-dispatch object-oriented languages 
link together the following three properties: (1) The 
class(es) in which a function is defined, (2) The 
class(es) representation(s) that a function can ac- 
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cess, and (3) The class that is used in dynamic dis- 
patch of the function. In many languages, these 
three properties unified in the concept of “the” class 
ofa method. C+-+ allows slightly more flexibility by 
allowing a function to access the representations of 
several classes if it is listed as a friend (or member) 
in each of them, though it can only be dynamically 
dispatched based on the (single) class of which it is 
a member. Even some multiple-dispatch languages 
unify the concept of access and dispatch: For exam- 
ple, in Cecil “a multi-method is granted privileged 
access to all objects of which the multi-method is 
a part, i.e., of the objects that are the method’s 
argument constraints” [5, Section 1.5]. 


The unification of properties (1) and (2) essentially 
defines data encapsulation, which plays an essen- 
tial role in reuse of classes. Since direct access to 
a class’s representation is allowed only from those 
functions included in the class itself, we can rest as- 
sured that uses of the class by other functions (in- 
cluding all “reuse”) will not corrupt any properties 
guaranteed by the class as it was originally writ- 
ten. Implicit in the idea of data encapsulation is 
the principle that programmers will not rampantly 
add operations to a completed class. If new opera- 
tions are added to a class, and thus granted access 
to its representation, we can no longer guarantee 
that the representation cannot be corrupted. To re- 
tain this important property, accessory functions do 
not have access to classes involved in their dispatch 
(unless they are listed as friends of that class, for 
some reason). 


We have proposed that the property of dynamic dis- 
patch be separated from the property of inclusion in 
(and access to) a class. While we originally argued 
that this be done to support reuse, we find it ap- 
pealing for several other reasons. First, it provides 
greater orthogonality of language features. Proper- 
ties (1) and (2) above must remain unified, but dy- 
namic dispatch is now fully independent. Second, 
we believe that accessory functions strengthen lan- 
guage support for data encapsulation. One tenet of 
data encapsulation is that each class should be de- 
fined with a set of operations that is both adequate 
and minimal: 


There can also be too many operations in 
a type... In this case, the abstraction may 
be less comprehensible, and implementa- 
tion and maintenance are more difficult. 
The desirability of extra operations must 
be balanced against the cost of implement- 





ing these operations. If the type is ade 
quate, its operations can be augmented by 
procedures that are outside the type’s im- 
plementation. [14, Section 4.9.3} 


Stroustrup also discusses this principle [17, Sec- 
tion 11.5.2]. Thus, it can be argued that both the 
interpret and print.rep functions belong outside 
the A.S.T. classes in our motivating example, as 
both can be written efficiently in terms of existing 
operations. However, without accessory functions, 
these operations must be placed inside the class. 


Note that our need for dynamic dispatch for our 
A.S.T. example is not simply an artifact of the fact 
that we have not provided a more abstract way of 
traversing an abstract syntax tree. If we provide 
either an iterator or a traversal function to apply 
arbitrary code to each element of the tree, we still 
find the need to associate certain code with certain 
kinds of A.S.T. nodes. Dynamic dispatch provides 
a simple and efficient mechanism for doing so. Only 
the restriction of dispatch to members of the class 
keeps us from using it in these cases. 


Since accessory functions do not have access to the 
private data of the data structure to which they 
are applied, they cannot save state information in 
this structure. We must, therefore, accumulate any 
needed information in some other way. In our “in- 
terpret” example, information is kept as temporary 
values in the C++ run-time stack; this works be 
cause we only need to produce a single final value 
(the result of the expression). If we need more com- 
plex information, such as a value associated with 
each node in the tree, we can build up an auxil- 
iary data structure (for example, a second tree that 
contains values that correspond to the nodes in the 
A.S.T.). If we wish to traverse the data structure 
and modify it, we must modify the class (by us- 
ing traditional virtual functions instead of acces- 
sory functions, or making the accessory functions 
into friends of the class). This is consistent with 
the principle that only operations listed in the class 
can access the class’s private data. 


5 Implementation for C++ 


Given the definitions and rules of Section 3, the im- 
plementation of accessory functions does not pose 
many interesting technical challenges. We simply 
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move the existing algorithms for building dispatch 
tables from compile-time to link-time, retaining the 
general principles used for virtual functions in C++: 
each ob ject contains a pointer to a table of all func- 
tions that may be dynamically dispatched based on 
its type, and the compiler statically produce code 
that will locate a given function with a (constant 
time) table lookup (for single inheritance, we sim- 
ply use a fixed offset into the table). 


The restrictions in Section 3.2 can be checked 
trivially as function declarations are processed at 
compile-time. To compile a (possibly dynamically 
dispatched) function call, we start by applying the 
C++ rules for overloaded function selection [17, Sec- 
tion 7.4] to the set of functions that are in scope and 
have the correct name and number of parameters, 
with the restriction that type casting of values can- 
not be used to create a match with a virtual param- 
eter. If there is not a unique match, a compile-time 
error is produced. If there is a unique best match, 
we generate either a regular function call (if the best 
match has no virtual parameter) or a dynamic func- 
tion dispatch. 


If the best match had a virtual parameter, the dy- 
namic dispatch is very much like a C++ virtual 
function call. We know that the virtual argument 
will contain a pointer to a table of dynamically dis- 
patched functions, and generate a load of a func- 
tion pointer from this table, and a branch to this 
address. The presence of accessory functions means 
that we no longer know the size or layout of this ta- 
ble when compiling a single file, but we handle this 
by treating the offset into the table as an undefined 
reference that will be filled in later by the linker. 


Note that we need a separate entry in the dispatch 
table for each virtual parameter position, function 
name, and arity. A group of three-parameter func- 
tions may be dispatched differently from some two- 
parameter functions of the same name; different 
three-parameter functions may have different vir- 
tual parameters (as long as they arenot in the same 
scope). 


We rely on the compiler to give the linker a complete 
description of the DAG describing the class inheri- 
tance structure, and a list of all functions (complete 
with parameter types and information about which 
parameters are virtual). We topologically sort the 
inheritance DAG and we apply the algorithm used 
to build C++ virtual function tables, using a new 
offset for each set of statically distinct functions. 


Since the offsets are determined at this stage, we can 
resolve the undefined references produced at com- 
pile time. 


This implementation places an increased load on the 
linker, and thus may increase link times. However, 
this is a fundamental consequence of the fact that 
declarations of accessory functions for a given class 
may be spread across several files (unlike the class’ 
virtual functions), not a weakness of our implemen- 
tation. The distribution of accessory functions over 
different files prevents detection by the compiler of 
certain errors that could be detected by traditional 
C++ compilers, such as the instantiation of a class 
with a pure virtual accessory function (onc file may 
instantiate an object of aclass for which pure virtual 
accessory functions are created in another file). 


It may be possible to reduce the link-time overhead 
somewhat by replacing our implementation with a 
version of Millstein and Chambers’ techniques for 
modular multimethod dispatch [15], restricted to 
the case of single dispatch. We have not investi- 
gated this possibility, since some degree of link-time 
overhead is unavoidable, and our main goal is to 
present a simple implementation that demonstrates 
that we can retain the constant-time nature of the 
dynamic dispatch used in C++. 


6 Related Work 


Snyder {16] and Liskov [13] have also studied con- 
flicts between data encapsulation and other aspects 
of object-oriented programming. They discuss prob- 
lems that arise when subclass operations are given 
access to superclass data or private operations, and 
Snyder [16] observes that a class’s superclasses can- 
not be considered an implementation detail of the 
class in a system that allows multiple inheritance 
without replicating common superclasses. 


Appel [2, Section 4.2], Harrison and Ossher [10], 
Krishnamurthi, Felleisen, and Friedman [11], and 
possibly many others have noted the conflict that 
arises between reuse by inheritance and reuse in a 
function. There are a number of approaches to re- 
solving this problem, which we discuss in order of 
increasing familiarity for programmers familiar with 
C++. 


Some techniques for multiple dispatch (also known 
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as multi-methods) [3, 9, 5, 4] could be used to pro- 
vide dispatch on a parameter other than the re- 
ceiver of an object, or by including the type needed 
for dispatch in a new tuple type [12]. However, 
general multi-method dispatch either requires more 
than constant time per dispatch or excessively large 
dispatch tables [1]. However, recent techniques for 
multimethod dispatch [6] have very low overhead, 
and we believe they would be at least as efficient as 
our system for the case of single dispatch (which, as 
we have noted, is all that is needed to resolve the 
conflict between different kinds of reuse). 


Work on multiple dispatch also differs from ours in 
that it not focused on the separation of dispatch and 
access. Cecil explicitly retains the unification of dis- 
patch and access, though a change to this rule would 
probably not have any impact on performance. We 
believe the main barrier to the widespread use of 
these techniques to enable both kinds of reuse is the 
tendency of programmers to prefer familiar tech- 
niques and languages. The other approaches to 
solving this problem (including ours) focus on tech- 
niques or language extensions that can be applied to 
C++ or Java. General discussions of techniques for 
multi-method dispatch can be found in [1, Section 
3.2] and [6, Section 3.7]. 


The visitor pattern could be applied to our ab- 
stract syntax tree example: Each tree class 
(such as Plus) would provide a “visit” opera- 
tion that takes a “visitor” parameter, and sends 
the visitor a message that is specific to the tree 
subclass (eg. Plus::visit(visitor kv) sends 
v.visitPlus(this) ). This approach still interferes 
with the addition of new subclasses, since the visitor 
class must be extended to include a new method for 
each new subclass. The “Extended Visitor” pro- 
tocol [11] fixes this problem, but still has higher 
overhead than a single dispatch accessory function, 
and to some degree shifts the burden of perform- 
ing dispatch back from the compiler onto the pro- 
grammer. It thus creates unnecessary opportunities 
for programmer error, and suffers from limitations 
due to the lack of compiler support. Krishnamurthi, 
Felleisen, and Friedman have developed a language 
named Zodiac to simplify the use of the extended 
visitor pattern, but it is not clear how quickly it 
will be adopted by programmers who are familiar 
with C++ or Java. 


Harrison and Ossher [10] proposed the “Subject- 
Oriented Programming” style. This approach, like 
our accessory functions, can serve as the basis for 


extension of an existing language like C++ (it is 
currently available as a preprocessor for C++ in 
IBM’s Visual Age for C++ Version 4). Instead of 
separating the property of dispatch from presence 
in a class, subject-oriented programming facilitates 
the decomposition of a class into different “subjects” 
that can be developed independently and then com- 
posed. A subject can correspond to one of our ac- 
cessory functions, a group of functions, or functions 
together with associated data (like a class). This ap- 
proach is more general than ours (though not more 
general than some of the multimethod systems), and 
correspondingly raises more new issues for program- 
mers, such as the selection of composition system. 


We have focused on providing a resolution to the 
conflict between reuse by inheritance and reuse in 
a function, while creating the minimal impact on 
programmers who are familiar with the traditional 
object-oriented style. Our extensions can be added 
to C++ by relaxing a single rule (that dynamic dis- 
patch must be based on the receiver of the message). 
A preliminary description of accessory functions ap- 
peared at MASPLAS ’99 [7]. We have also explored 
the possibility of allowing multiple virtual parame- 
ters [8], though this work does not make a significant 
contribution to the existing literature on multiple 
dispatch. 


7 Conclusions 


Current single-dispatch object-oriented languages 
provide dynamic dispatch only for functions listed 
in the class involved in dispatch, even if overloading 
is allowed for other parameters. This property gives 
the author of a class the responsibility of enumer- 
ating all cases in which dynamic dispatch is needed 
for objects of this type This hinders code reuse 
by forcing the designer of a set of types to choose 
between allowing reuse by inheritance (by using dy- 
namic dispatch) and reuse in a function (by using 
explicit switches on the kind of object). 


It is possible and (we believe) desirable to provide 
dynamic dispatch to users of a class hierarchy. In 
other words, we should eliminate the coupling be- 
tween dispatch method and membership in (and ac- 
cess to) a class. This decoupling lets programmers 
achieve both reuse by inheritance and reuse in a 
function. Thus, our “accessory functions” improve 
the support for both reuse and data encapsulation, 
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and can be implemented with the same efficient dis- 
patch algorithms used in current C++ virtual func- 
tion selection. 
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Abstract 


Distributed object computing (DOC) middleware shields de- 
velopers from many tedious and error-prone aspects of pro- 
gramming distributed applications. Without proper support 
from the middleware, however, it can be hard to evolve dis- 
tributed applications after they are deployed. There fore, DOC 
middleware should support meta-programming mechanisms, 
such as smart proxies and interceptors, that improve the adapt- 
ability of distributed applications by allowing their behavior 
to be modified without changing existing software drastically. 

This paper presents three contributions to the study of meta- 
programming mechanisms for DOC middleware. First, it il- 
lustrates, compares, and contrasts several meta- programming 
mechanisms from an application developer's perspective. Sec- 
ond, it outlines the key design and implementation challenges 
associated with developing smart proxies and portable inter- 
ceptors features for CORBA. Third, it presents empirical re- 
sults that pinpoint the performance impact of smart proxies 
and interceptors. Our goal is to help researchers and develop- 
ers determine which meta-programming mechanisms best suit 
their application requirements. 


1 Introduction 


Motivation: Developers of distributed applications face 
many challenges stemming from inherent and accidental com- 
plexities, such as latency, partial failure, and non-portable 
low-level OS APIs. The magnitude of these complexities— 
combined with increasing time-to-market pressures—~make it 
increasingly impractical to develop distributed applications 
manually from scratch. Commercial-of f-the-shelf (COTS) dis- 
tributed object computing (DOC) middleware helps address 
these challenges by: 

1. Defining standard higher-level programming abstrac- 
tions, such as distributed object interfaces, that provide loca- 
tion transparency to clients and server components; 


2. Shielding application developers from low-level con- 
current network programming details, such as connection 
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management, data transfer, parameter (de)marshaling, end- 
point and request demultiplexing, error handling, multi- 
threading, and synchronization; and 


3. Amortizing software lifecycle costs by leveraging pre- 
vious development expertise and capturing implementations of 
key patterns in reusable middleware frameworks and common 
services. 


In the case of standards-based DOC middleware, such as 
CORBA [1], these capabilities are realized via an open specifi- 
cation process. The resulting products can interoperate across 
many OS/network platforms and programming languages [2]. 

To date, CORBA middleware has been used successfully to 
enable developers to create applications rapidly that can meet a 
particular set of requirements with a reasonable amount of ef- 
fort. CORBA has been less successful, however, at shielding 
developers from the effects of requirement or environmental 
changes that occur late in an application’s life-cycle, i.e., dur- 
ing deployment and/or at run-time. To address this problem, 
this paper describes and evaluates meta-programming mecha- 
nisms, which improve the adaptability of distributed applica- 
tions by allowing their behavior to be modified with little or 
not change to existing application software. 

The two meta-programming mechanisms we focus on in 
this paper are: 


e Smart proxies, which are application-provided stub im- 
plementations that transparently override the default stubs cre- 
ated by an ORB to customize client behavior on a per-interface 
basis. 

e Interceptors, which are objects that an ORB invokes in 
the path of an operation invocation to monitor or modify the 
behavior of the invocation transparently. 


These two meta-programming mechanisms can be used to 
configure new or enhanced functionality into CORBA appli- 
cations with minimal impact on existing software. The mate- 
rial presented in this paper is based on our experience imple- 
menting and using smart proxies and interceptors in TAO [3], 
which is a open-source, CORBA-complaint ORB designed to 
support applications with demanding quality-of -service (QoS) 
requirements. 
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Paper organization: The remainder of this paper is struc- 
tured as follows: Section 2 presents an overview of the smart 
proxy and interceptor meta-programming mechanisms; Sec- 
tion 3 describe the patterns that guided the development of 
TAO’s smart proxy and interceptor mechanisms and resolved 
key design challenges; Section 4 illustrates the performance 
characteristics of TAO’s smart proxy and interceptor mecha- 
nisms; Section 5 compares our work with related research; and 
Section 6 presents concluding remarks. 


2 Overview of Smart Proxies and In- 
terceptors 


DOC middleware provides stub and skeleton mechanisms that 
serve as a “glue” between the client and servants, respec- 
tively, and the ORB. For example, CORBA stubs implement 
the Proxy pattern [4] and marshal operation information and 
data type parameters into a standardized request format. Like- 
wise, CORBA skeletons implement the Adapter pattern [4] 
and demarshal the operation information and typed parame- 
ters stored in the standardized request format. 

CORBA stubs and skeletons can be generated automati- 
cally from schemas defined using the OMG Interface Defi- 
nition Language (IDL). A CORBA IDL compiler transforms 
application-supplied OMG IDL definitions into stubs and 
skeletons written using a particular programming language, 
such as C++ or Java. In addition to providing program- 
ming language and platform transparency, an IDL compiler 
eliminates common sources of network programming errors 
and provides opportunities for automated compiler optimiza- 
tions [5]. 

Traditionally, the stubs and skeletons generated by an IDL 
compiler are fixed, i.e., the code emitted by the IDL compiler 
is determined at translation time. This design shields applica- 
tion developers from the tedious and error-prone network pro- 
gramming details needed to transmit client operation invoca- 
tions to server object implementations. Fixed stubs and skele- 
tons make it hard, however, for applications to adapt readily 
to certain types of changes in requirement or environmental 
conditions, such as: 


e The need to monitor system resource utilization may not 
be recognized until after an application has been de- 
ployed. 

e Certain remote operations may require additional param- 
eters in order to execute securely in a particular environ- 
ment. 


e The priority at which clients invoke or servers handle a 
request may vary according to environmental conditions, 
such as the amount of CPU or network bandwidth avail- 
able at run-time. 
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In applications based on CORBA middleware with conven- 
tional fixed stubs/skeletons, these types of changes often re- 
quire re-engineering and re-structuring of existing application 
software. One way to minimize the impact of these changes 
is to devise meta-programming mechanisms that allow appli- 
cations to adapt to various types of changes with little or no 
modifications to existing software. For example, stubs, skele- 
tons, and certain points in the end-to-end operation invocation 
path can be treated as meta-objects [6], which are objects that 
refine the capability of base-level objects, which are the ob- 
jects comprising the bulk of application programs. 


As shown in Figure 1, CORBA ORBs are responsible 
for transmitting client operation invocations to target objects. 
When a client invokes an operation, a stub implemented as 
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Figure 1: Interactions Between Requests and Meta-objects 
End-to-End 





a meta-object can act in conjunction with transport-protocol 
meta-objects to access and/or transform a client operation in- 
vocation into a message and transmit it to a server. Corre- 
sponding meta-objects on the server’s request processing path 
can access and/or perform inverse transformations on the op- 
eration invocation message and dispatch the message toits ser- 
vant. An invocation result is delivered in a similar fashion in 
the reverse direction. 


As all operation invocations pass through meta-objects, cer- 
tain aspects of application and middleware behavior can be 
adapted transparently when system requirements and envi- 
ronmental conditions change by simply modifying the meta- 
objects. To modify meta-objects, the DOC middleware can 
either (1) provide mechanisms for developers to installed cus- 
tomized meta-objects for the client or (2) embed hooks imple- 
menting a meta-object protocol (MOP)[6] in the meta-objects 
and provide mechanisms to install objects implementing the 
MOP to strategize these meta-object behaviors. In the context 
of CORBA, smart proxies are customized meta-objects and 
interceptors are objects that implement the MOP. 
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2.1 Overview of Smart Proxies 


Most CORBA application developers use the fixed stubs gen- 
erated by an IDL compiler without concern for how the stubs 
are implemented. There are situations, however, where the de- 
fault stub behavior is inadequate. For example, an application 
developer may wish to change stub code transparently in order 
to: 


e Perform application-specific functionality, such as log- 
ging; 
e Add parameters to a request; 


e Cache requests or replies to enable batch trans fer or min- 
imize calls to a remote target ob ject, respectively; 


e Support advanced quality-of-service (QoS) features, such 
as load balancing and fault-tolerance; or 


e Enforce security mechanisms, such as authentication of 
credentials. 


To support these capabilities without modifying existing client 
code, applications must be able to override the default stub 
implementations selectively. These application-defined stubs 
are called smart proxies, which are customizable meta-ob jects 
that can mediate access to target objects more flexibly than the 
default stubs generated by an IDL compiler. Smart proxies 
allow developers to modify the behavior of interfaces without 
re-implementing client applications or target objects. 

The two main entities in smart proxy designs are (1) the 
smart proxy factory and (2) the smart proxy meta-object, 
which are shown in Figure 2. When using a smart proxy 





Figure 2: TAO’s Smart Proxy Model 


to modify the behavior of an interface, the developer imple- 
ments the smart proxy class and registers it with the ORB. 
After installing the smart proxy factory, the ORB automati- 
cally uses the application-supplied factory to create object ref- 
erences when a client invokes the _narrow operation of an 
interface. Thus, if smart proxies are installed before a client 


accesses these interfaces, the client application can trans par- 
ently use the new behavior of the proxy returned by the fac- 
tory. 

Smart proxies are not yet standardized in CORBA, though 
many ORBs support this feature as a proprietary extension. 


2.2 Overview of Interceptors 


The smart proxies feature outlined above is a meta- 
programming mechanism that increases the flexibility of client 
applications. Interceptors are another meta-programming 
mechanism used in DOC middleware to increase the flexibility 
of both client and server applications. In CORBA, intercep- 
tors are standard meta-objects that stubs, skeletons, and certain 
points in the end-to-end operation invocation path can invoke 
at predefined “interception points.” 

Prior to CORBA 2.3.1 interceptors were under-specified 
and therefore non-portable. In contrast, the interceptors dis- 
cussed in this paper are based on the so-called “Portable Inter- 
ceptors” specification [7], which is being ratified by the OMG. 
Two types of interceptors are defined in the CORBA Portable 
Interceptor specification: 


e@ Request interceptors, which deal with operation invoca- 
tions; 

e JOR interceptors, which insert information into interop- 
erable object references (IORs). 


Both types of interceptor are des cribed below. 


2.2.1 Request Interceptors 


Request interceptors can be decomposed into client request in- 
terceptors and server request interceptors, which are designed 
to intercept the flow of a request/reply sequence through the 
ORB at specific points on clients and servers, respectively. De- 
velopers can install instances of these interceptors into an ORB 
via an IDL interface defined by the Portable Interceptor speci- 
fication. Regardless of what interface or operation is invoked, 
after request interceptors are installed they will be called on 
every operation invocation at the pre-determined ORB inter- 
ception points shown in Figure 3. 

As shown in this figure, request interception points occur 
in multiple parts of the end-to-end invocation path when a 
client sends a request, when a server receives a request, when 
a server sends areply, and when a client receives a reply. Dif- 
ferent hook methods will be called at different points in this 
interceptor chain. For example, the send_request hook is 
called on the client before the request is marshaled and the 
receive_request hook is called on the server after the re- 
quest is demars haled. 

Compared to a client invocation path, a server invo- 
cation path has an additional interception point called 
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Figure 3: Request Interception Points in the CORBA Portable 
Interceptor Specification 


receive..request_service_contexts, which is in- 
voked before the POA dispatches a servant manager. This 
interception point prevents unnecessary upcalls to a servant. 
For example, in the CORBA Security Service [8] framework 
this interception point can be used to inspect security-related 
credentials piggybacked in a service context list entry. If the 
credentials are valid the upcall can proceed to other intercep- 
tors (if they exist) or to the servant; if not, an exception will be 
returned to the client. 


The behavior of an interceptor can be defined by an ap- 
plication developer. An interceptor can examine the state of 
the request that it is associated with and perform various ac- 
tions based on the state. For example, interceptors can invoke 
other CORBA operations, access information in a request, in- 
sert/extract piggybacked messages in a request’s service con- 
text list, redirect requests to other target objects, and/or throw 
exceptions based on the object the original request is invoked 
upon and the type of the operation. Each of these capabilities 
is described below: 


Nested invocations: A request interceptor can invoke oper- 
ations on other CORBA objects before the current invocation 
it is intercepting completes. For example, monitoring and de- 
bugging utilities can use this feature to log information associ- 
ated with each operation invocation. To avoid causing infinite 
recursion, developers must be careful to act only on targeting 
interfaces and operations they intend to affect when perform- 
ing nested invocations in an interceptor. 
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Accessing request information: Request interceptors can 
access various information associated with an _ invoca- 
tion, such as the operation name, parameters, exception 
lists, return values, and the request id via the MOP in- 
terface as defined in the Portable Interceptor specifica- 
tion. Interceptors cannot, however, modify parameters 
or return values. This request/reply information is en- 
capsulated in an instance of ClientRequestInfo or 
ServerRequestInfo classes, which derive from the 
Request Infoclass and contain the information listed above 
for each invocation. 

For example, client request interceptors are passed 
ClientRequestInfo and server request interceptors are 
passed ServerRequestinfo. These RequestInfo- 
derived objects can use features provided by the CORBA 
Dynamic module. This module is a combination of pseudo- 
IDL types, such as RequestContext and Parameter, 
declared in earlier CORBA specifications. These types fa- 
cilitate on-demand access of request information from the 
RequestInfo to avoid unnecessary overhead if an inter- 
ceptor does not need all the information available with the 
RequestInfo. 


Service context manipulation: As mentioned earlier, re- 
quest interceptors cannot change parameters or the return 
value of an operation. They can, however, manipulate ser- 
vice contexts that are piggybacked in operation requests and 
replies exchanged between the clients and servers. A service 
context is a sequence field ina GIOP message that can transmit 
“out-of-band” information, such as authentication credentials, 
transaction contexts, operation priorities, or policies associ- 
ated with requests. 

For example, the CORBA Security Service uses request in- 
terceptors to insert user identity via service contexts. Like- 
wise, the CORBA Transaction Service uses request inter- 
ceptors to insert transaction-related information into service 
contexts so it can perform extra operations, such as com- 
mit/rollback, based on the operation results in a transaction. 
Each service context entry has a unique service context iden- 
tifier that applications and CORBA components can use to ex- 
tract the appropriate service context. 


Location forwarding: Request interceptors can be used to 
forward a request to a different location, which may or may 
not be known to the ORB a priori. This can be done via 
the PortableInterceptor: :ForwardRequest ex- 
ception, which allows an interceptor to inform the ORB that 
a retry should occur upon the new object indicated in the ex- 
ception. The exception can also indicate whether the new ob- 
ject should be used for all future invocations or just for the 
forwarded request. 

Since the ForwardRequest exception can be raised at 
most interception points, it can be used to provide fault tol- 
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erance and load balancing [9]. For example, the IOR of a 
replicated object can be used as the forward object in this ex- 
ception. When the object dies for some reason—and this situa- 
tion is conveyed to the interceptor—this exception can be raised 
even before the POA tries to make an upcall. 


Multiple interceptors: Multiple request interceptors can be 
registered with an ORB, which will then iterate through them 
and invoke the appropriate interception operation at every in- 
terception point according to the following rules: 


e For each request interceptor, only one starting inter- 
ception point can be called for a given invocation. A 
starting interception point is the first point invoked in 
a request/reply sequence. For instance, the starting 
points for a client ORB include send-request and 
send_poll. Likewise, the starting point for a server 
ORB is receive_request_service_contexts. 


e For each request interceptor, only one ending interception 
point can be called for a given invocation. The ending in- 
terception point is the last juncture where an interception 
may occur in the request/reply sequence. The ending in- 
terception points on a client ORB are receive..reply, 
receive_exception, and receive-other and the 
ending interception points for a server ORB consist of 
send.reply, send.exception, and send-other. 


e There can be multiple intermediate interception points. 


e Intermediate interception points cannot be invoked in the 
case of an exception. 


e The ending interception point for a given interceptor will 
be called only if the starting interception point runs to 
completion. 


Multiple interceptors are invoked using a flow-stack model . 
When initiating an operation invocation, an interceptor is 
pushed onto the stack after its starting interception point com- 
pletes successfully. When an invocation completes, the inter- 
ceptors are popped off the stack and invoked in reverse order. 
The flow-stack model ensures that only interceptors executed 
successfully for an operation can process the reply/exceptions. 


Exception handling: Request interceptors can affect the 
outcome of a request by raising exceptions in the in- 
bound or outbound invocation path. In such cases, the 
send_exception operation of a server request intercep- 
tor is invoked on the reply path and is received at the 
client in the receive.exception interceptor hook. When 
a send_exception or receive-exception operation 
raises a For wardRequest exception, the other interceptors 
have their send_other and receive_other interception 
points invoked, respectively. 


2.2.2 IOR Interceptors 


IIOP version 1.1 introduced an attributecalled components, 
which contains a list of tagged components to be embedded 
within an IOR. When an IOR is created, tagged components 
provide a placeholder for an ORB to store extra information 
pertinent to the object. This information can contain various 
types of QoS information related to security, server thread 
priorities, network connections, CORBA policies, or other 
domain-specific information. 

The original IIOP 1.0 specification provided no standard 
way for applications or services to add new tagged compo- 
nents into an IOR. Services that require this field were there- 
fore forced to use proprietary ORB interfaces, which impeded 
their portability. The Portable Interceptors specification re- 
solves this problem by defining JOR interceptors. 

IOR interceptors are objects invoked by the ORB when it 
creates IORs. They allow an IOR to be customized, e.g., by ap- 
pending tagged components. Whereas request interceptors ac- 
cess operation-related information via RequestInfos, IOR 
interceptors access IOR-related information via IORInfos. 
Figure 4 illustrates the behavior of IOR interceptors. A server 
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Figure 4: IOR Interceptors 


ORB responsible for creating an IOR contains an JOR in- 
terceptor repository. In turn, this repository contains a se- 
ries of IOR interceptors that have been registered with the 
ORB. When the server process requests the ORB to cre- 
ate an IOR, the ORB iterates through the IOR intercep- 
tors in the repository using the establish-components 
operation. The IOR interceptors then add tagged com- 
ponents to the IOR being generated by refering to the 
L@RiInfo passed in by calling add_ior_component or 
add_ior_component-to_profile. 


2.3 Evaluating Alternative Meta-Programming 
Mechanisms for ORB Middleware 


We presented an overview of smart proxies and interceptors 
above. We now evaluate these two mechanisms, and then 
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compare and contrast them with two other meta-programming 
mechanisms—pluggable protocols and servant managers-that 
are provided by most CORBA implementations. 


2.3.1 Smart Proxies vs. Interceptors 


Smart proxies and interceptors are similar in that they extend 
ORB-mediated invocations and functions. They differ, how- 
ever, in their architecture and have their own pros and cons, as 
described below. 


Intent: A smart proxy can be used for a variety of purposes, 
such as improving performance via caching, whereas inter- 
ceptors are used primarily to (1) audit and verify information 
along the invocation path and (2) redirect the operation if nec- 
essary. For instance, a server request interceptor can determine 
whether the server should handle certain operation invocations 
by inspecting the incoming requests and forwarding some re- 
quests to other servers that can handle them. 


Scope of control: A different smart proxy can be config- 
ured for each interface, whereas the same set of interceptors 
will be invoked at all the ORB mediated points of an invoca- 
tion. Moreover, a smart proxy is solely a client mechanism, 
whereas request interceptors are invoked on the request path 
from client-to-server and on the reply path from server-to- 
client.! 


Invocation points: A smart proxy invocation point occurs 
whenever an operation is invoked through a stub. In contrast, 
interceptors are invoked at many points, including IOR cre- 
ation time and/or before a call is sent by the POA to the ser- 
vant. 


Cardinality: A client can have only a single smart proxy for 
each interface, whereas multiple interceptors can be registered 
with the ORB. 


Modifiability: Since smart proxies replace default ORB 
generated stubs completely, smart proxies can modify the pa- 
rameters or results of an operation. In contrast, the Portable 
Interceptor specification does not allow request interceptors to 
change operation parameters or return values. 


Overhead: A smart proxy mechanism incurs minimal over- 
head, i.e., a single extra method call per-operation invocation. 
In contrast, request interceptors can incur additional overhead 
to access request information because information related to 
the request is bundled into anys, which have higher overhead 
for their insertion and extraction operations . 


Standardization: Smart proxies have not yet been standard- 
ized in the CORBA specification. CORBA interceptors will be 
standardized after the Portable Interceptor specification is rat- 
ified . 


'IOR interceptors are just invoked during object reference creation. 


In general, design problems that require pre-invocation or 
per-interface extensions are well-suited for smart proxies. 
Portable interceptors, in contrast, provide a suitable solution 
for applications that require a semantically richer—albeit some- 
what more expensive—meta-programming abstraction. 


2.3.2 Servant Managers 


The CORBA POA specification [1] allows server applications 
to register servant manager objects that activate servants on- 
demand, rather than creating all servants before listening for 
requests. There are two types of servant managers in CORBA: 


e Servant activators, which provide a hook method called 
incarnate that creates a servant the first time an object is 
accessed by a client. 


e Servant locators, which provide a hook method called 
preinvoke that are invoked by a POA to create a servant for 
every request on an object. Figure 5 illustrates how servant 
locators are used in a CORBA application to perform various 
resource management activities before dispatching an opera- 
tion to a servant. 
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Figure 5: Managing Resources with a Servant Locator 


A servant locator is similar to an interceptor in several re- 
spects. For example, both are implementations of the Inter- 
ceptor pattern [10]. Moreover, both can (1) intercept requests 
before they are dispatched to servants, (2) invoke extra opera- 
tions, and (3) affect the outcome of request invocations, e.g., 
by throwing exceptions. Unlike interceptors, however, servant 
locators only affect the POAs that install them and can only 
provide access to a limited subset of the request-related infor- 
mation. As a consequence, they are more tightly coupled with 
POAs and servant implementations than are interceptors. 


2.3.3. Pluggable Protocols Frameworks 


Another type of meta-programming mechanisms provided 
by some DOC middleware is pluggable protocols frame- 
works [11, 12], which is in the process of being standardized 
by the OMG in the Extensible Transport Framework [13] spec- 
ification effort. These frameworks decouple the ORB’s trans- 
port protocols from its component architecture. Developers 
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can therefore add new protocols without requiring changes to 
existing application software. 


Figure 6 illustrates TAO’s pluggable protocols framework, 
which allows developers to install new protocols into the 
ORB by implementing customized pluggable protocol objects. 
Higher-level application components and CORBA services 
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Figure 6: TAO’s Pluggable Protocols Framework Architecture 


use the Component Configurator pattern [10] to dynamically 
configure custom protocols into TAO’s pluggable protocols 
framework without requiring obtrusive changes to themselves 
or the ORB. 


As with interceptors and smart proxies, pluggable proto- 
cols frameworks are meta-programming mechanisms that add 
functionality to ORBs. However, whereas other two mecha- 
nisms alter the semantic of objects, pluggable protocols frame- 
works alter the underlying ORB transport mechanism. Thus, 
they do not permit fine-grained control over objects since they 
affect all objects inan ORB and it is hard to vary the transport 
mechanism at the level of object references. Moreover, since 
pluggable protocols deal directly with the communication in- 
frastructure, they are usually more complex to program than 
interceptors or smart proxies. 


Figure 7 compares the various meta-programming mech- 
anisms along a number of dimensions described above. 
Portable interceptors have the highest overhead since they are 
the most flexible meta-programming mechanism. Although 
other mechanisms have less overhead compared to portable 
interceptors, they are targeted at more specific system mech- 
anisms. When combined with patterns, such as Component 
Configurator [10] and OS features, such as explicit dynamic 
linking [14], these meta- programming mechanisms can all be 
configured dynamically into CORBA clients and servers. 
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Figure 7: Comparing Alternative Meta-programming Mecha- 
nisms 


3 Key Design Challenges and Pattern- 
based Resolutions 


In this section, we explore how smart proxies and intercep- 
tors are implemented in TAO. To clarify and generalize our 
approach, the discussion below focuses on the patterns [4] we 
applied to resolve the key design challenges faced during our 
development process. 


3.1 Smart Proxy Design Challenges and Reso- 
lutions 


As mentioned in Section 2.1, the goal of using smart proxies 
is to change/add behaviors to existing programs with minimal 
modifications to client applications. Below, we discuss the key 
design challenges we faced while refactoring TAO’s existing 
stub architecture to support smart proxies. 


3.1.1 Challenge: Providing Flexible Support for Smart 
Proxies 


Context: The proxy framework generated by TAO’s IDL 
compiler should allow applications to use customized prox- 
ies transparently. For example, changes to client applications 
that use customized proxies must be localized. In particular, 
developers should be able to install customized proxies with 
little or no change to client application code. 


Problem: TAO’s original IDL compiler generated only fixed 
default proxies. In particular, the .narrow operation it gen- 
erated for each interface returned a default proxy. If develop- 
ers require more flexibility, however, the narrow operation 
must be able to return either an IDL-generated default proxy 
or acustom smart proxy. 

Since the .narrow operation is generated by TAO’s IDL 
compiler as part of the client’s stub it is not possible to mod- 
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ify this method externally from a client application. More- 
over, since fixed default stubs were generated any changes re- 
quired manually modifying the IDL-generated code. Clearly, 
this solution was inflexible and had to be solved at the stub- 
generation level. 


Solution — Apply the Factory Method, Adapter, and Sin- 
gleton patterns: We applied these design patterns [4] in 
TAO’s smart proxy framework to provide the necessary fle x- 
ibility to create different types of proxies transparently in 
TAO’s IDL-generated code, as follows: 


e The Factory Method pattern defers instantiation of vari- 
ous types of meta-objects to subclasses. 


e The Adapter pattern provides a higher level of abstrac- 
tion for TAO’s proxy factories and to delegate creation 
requests to the appropriate factory. 


e The Singleton pattern makes the proxy factory adapter a 
global access point for factory registration from program 
initialization to termination. 


Figure 8 illustrates how we applied these three patterns in 
TAO to provide flexible support for smart proxies. By using 
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Figure 8: Applying Patterns to Provide Flexible Support for 
Smart Proxies 
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these patterns, applications can obtain either the default IDL- 
generated proxy or a smart proxy without changing existing 
code manually. For example, after an application registers a 
per-interface smart proxy factory, the .narrow operation call 
will create the appropriate proxy automatically. 


3.1.2 Challenge: Treating Remote and Collocated Smart 
Proxies Uniformly 


Context: A target object can be either remote or it can be 
collocated in the client’s address space [15]. TAO provides 
customized meta-objects called collocated proxies to optimize 
performance for collocated objects. Smart proxies should pro- 
vide similar functionality to collocated and remote proxies 
since the ability to differentiate remote and collocated smart 
proxies provides developers with greater flexibility. 
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Problem: Depending on where a target object resides, a de- 
veloper may or may not wish to invoke the smart proxy in- 
stalled for the object. For example, a developer may not 
want to cache operation results in a collocated smart proxy be- 
cause these calls are already resolved locally. Originally, TAO 
treated the generation of collocated stubs as a special case and 
if smart proxies were installed they would supercede the de- 
fault stubs, even if the stubs were collocated. 

Ignoring collocation optimizations, however, may cause un- 
necessary waste by trying to optimize a bottleneck that does 
not exist. Therefore, it is necessary to distinguish the remote 
and collocated case to take full advantage of this construct and 
avoid unnecessary waste of system resources, such as memory 
and CPU cycles. In addition, smart proxies must (1) provide 
applications with the same interface as default proxies and (2) 
be able to call down to the default proxy to communicate with 
remote target objects. 


Solution —- Apply the Composite pattern: The Composite 
pattern [4] supports part/whole relationships and allows all ob- 
jects in such composite structures to be processed uniformly. 
We applied the Composite pattern to TAO to providea uniform 
view among different proxies available to clients. As shown in 
Figure 9, in this design (1) smart proxy classes inherit from the 
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Figure 9: Applying the Composite Pattern to TAO’s Smart 
Proxy Design 


default proxy and (2) also store a pointer to the default proxy 
to make invocations to target object. Collocated and remote 
proxies are children of the default proxy. Thus, smart proxies 
can make calls to the remote or collocated proxy transparently, 
while providing the same application interface as the default 
proxies. 


3.2 Interceptor Design Challenges and Resolu- 
tions 


As discussed in Section 2.2, interceptors can extend the behav- 
ior of CORBA operations with minimal changes to client and 
server applications. In this section, we discuss the key design 
challenges faced while enhancing TAO’s existing invocation 
architecture to support interceptors. 
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3.2.1 Challenge: Making Information Retrieval Possible 
Per-Operation 


Context: Request interceptor hook methods are invoked at 
different interception points along the invocation path. These 
interceptors must be able to (1) verify and audit information 
being passed to the target object as the invocation continues 
and (2) potentially terminate the invocation before it reaches 
the target object. 


Problem: An ORB must provide information in response 
to interceptor queries. This information may be operation- 
specific and even temporal. For example, the result of an op- 
eration may be available only after the POA makes an upcall 
to a servant and the operation executes. 

An ORB must therefore have a generic way to access 
operation-level information and disclose this information to 
interceptors that are invoked at ORB-mediated interception 
points. Originally, TAO did not maintain this information to 
avoid degrading the normal execution of the invocation in situ- 
ations where this information was not required by applications. 
However, TAO’s original design made it hard for applications 
to influence invocation behavior. 


Solution -+ Generation of nested RequestInfo classes for 
each interface operation: To provide invocation informa- 
tion dynamically and efficiently, we modified TAO’s IDL com- 
piler to generate RequestiInfo classes for each operation. 
RequestInfo classes are instantiated for each operation in- 
vocation and passed to the interceptors during the invocation. 
Thus, interceptors can access operation-related information, 
as shown in Figure 10. Every operation in an IDL interface 
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Figure 10: TAO’s Portable Interceptor Design 
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may have different formal parameters, result types, and user 
exceptions. To minimize the overhead of copying multiple ar- 
guments and the return value of the upcall, we only store a 
reference, rather than a copy of the parameters, results, and 
exceptions. 


We added TAO-specific methods to each RequestInfo 
class and used these methods internally to update the re- 
sult and the exception thrown, rather than instantiating a 
new RequestInfo class before every interception point 
is called. For instance, the result of an operation is ob- 
tained only after the POA makes the upcall and the client 
receives a reply. At this point, the client can verify the re- 
sultin the receive.repl]ly interceptor hook by querying the 
Request Info object, making it necessary to update the re- 
sult before this interception point is invoked. Thus, temporal 
information can also be propagated to interceptors. 


3.2.2, Challenge: Avoiding Gratuitous Waste Construct- 
ing RequestInfos 


Context: Interceptors can access any request-related infor- 
mation. Their interface must therefore be sufficiently general 
to incorporate any type of data. In CORBA, any is a generic 
type that can hold information of any other types, which are 
stored using type/value tuples. 


Problem: In general, not all interceptors installed in an ORB 
are interested in handling all information, or even all opera- 
tions. For example, security-related interceptors may not be 
interested in what operation is being invoked, but only want 
to know the contents of the service context list. Likewise, an 
auditing interceptor may only be interested in the parameters 
of certain operations of certain objects, while ignoring others 
altogether. 

Although CORBA’s any type is flexible, it is less efficient 
and more resource consumptive than other common CORBA 
data types, such as long or struct. We need to avoid the 
overhead of any insertion operators if installed interceptors 
are not interested in certain operation information. There is no 
way, however, to predict what interceptors will be interested 
in a priori. 


Solution -+ On-demand creation of operation information: 
To avoid unnecessary waste of resources, we applied the Lazy 
Initialization pattern [16] to make sure the operation informa- 
tion is only inserted into any objects the férst time a related 
interface is accessed by an interceptor via its Request Info- 
derived interface. This design ensures that pertinent informa- 
tion in RequestInfo-derived objects will only be created 
if an interceptor is interested in the information. In TAO, 
we retrieve this information via types defined in the CORBA 
Dynamic module. 

The Dynamic module defines the collocation of request 
parameters, results, and exceptions in any in a sequence of 
structures that an application interceptor can extract and use. 
In TAO, methods returning Dynamic objects are implemented 
to minimize the gratuitous waste of storing all information de 
facto into lists of anys as shown in Figure 11. In particu- 
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Figure 11: TAO applies Lazy Initialization building Dynamic 
objects in RequestInfo 


lar, this information is inserted into anys only when queried, 
which occurs just once. Subsequent queries simply return the 
any variables created previously. Thus, unless an interceptor 
needs to query a particular piece of request information, it in- 
curs no additional overhead. This optimization is targeted for 
the common case where interceptors are used to pass service 
contexts. 


3.2.3, Challenge: Implementing Time and Space Efficient 
Flow Stacks 


Context: The Portable Interceptor specification defines gen- 
eral flow rules to which a portable interceptor implementation 
should adhere. These rules ensure that only interceptors in- 
voked successfully from a starting interception point will ever 
be invoked at an ending interception point. Conceptually, in- 
terceptors are pushed on to a stack if invoked successfully in a 
starting interception point and poppedoff that stack when they 
invoked at ending interception points. 


Problem: To implement the semantics dictated by CORBA’s 
general flow rules, some type of stack implementation is 
needed. However, implementing a flow stack with a general- 
purpose stack container class, such as the one in the standard 
C++ library [17], has the following problems: 


e Time overhead: The stack implementation may incur 
non-trivial performance overhead if it allocates space off of 
the heap dynamically for each interceptor or interceptor refer- 
ence pushed onto the stack. Dynamic memory is particularly 
problematic for real-time applications. 


e Space overhead: The stack implementation itself adds 
to the ORB footprint since a template must be instantiated for 
each type of request interceptor, j.e., client or server request 
interceptors. Moreover, other auxiliary templates may need 
to be instantiated for internal stack support code. Not only 
does this increase the static footprint of the ORB, but it also 
increases run-time ORB memory requirements, which may be 
unacceptable for embedded applications. 
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In addition to inherent problems with real stack implemen- 
tations detailed above, another common problem can occur. 
Since interceptors are invoked during a request, they are in the 
critical path. This means that interceptor support code, such as 
a flow stack, can have an adverse affect on performance if that 
support is not implemented efficiently. In particular, adding 
locking mechanisms in the flow of a request can degrade per- 
formance since threads waiting fora lock can block. The act of 
acquiring and releasing the lock also imposes further delays. 


Solution —> Apply optimization principle patterns: Opti- 
mization principle patterns [18] define a set of principles that 
can be applied to improve performance in various ways. To 
implement time and space efficient flow stacks, heap alloca- 
tions must be minimized to avoid degrading performance and 
increasing footprint. Both can be avoided by taking advantage 
of pre-computed resources and the properties associated with 
them. 

As dictated by the Portable Interceptor specification, inter- 
ceptors are registered with the ORB when the ORB is boot- 
strapped, i.e., during the initial CORBA: :ORB-init call. 
This means that storage for the interceptors will already have 
been allocated by the time the interceptors are invoked so there 
should ideally be no need for additional allocations at a later 
point in time. 

By keeping the order with which the interceptors are stored 
unchanged for the lifetime of the ORB, it is possible to im- 
plement highly efficient stack push and pop operations. Inter- 
ceptors will always be pushed on to the stack with the same 
relative ordering they are stored in the ORB. This property en- 
sures that the number of elements on the stack will be equal 
to the ORB storage location of the last interceptor pushed on 
to the stack. Hence, the general flow rule semantics can be 
implemented using a logical flow stack. 


Applying the solution to TAO: TAO stores pointers to reg- 
istered interceptors in a pre-allocated array, which avoids in- 
creased footprint and run-time memory requirements. Rather 
than having to instantiate a stack for each type of intercep- 
tor (i.e., client and server request interceptors), a single array 
for each type of request interceptor is created. The order in 
which interceptors are stored in the array remains unchanged 
for the lifetime of the ORB. Thus, push and pop operations can 
be implemented by simply incrementing and decrementing a 
variable, respectively, as illustrated in Figure 12. 

The following example presents a scenario that illustrates 
how TAO’s logical flow stacks are implemented: 


1. Three request interceptors are registered when the ORB 
is initialized. Specifically, the CORBA: :ORB_init method 
invokes all ORB initializers registered by the application. 
Those ORB initializers then register the interceptors by us- 
ing the appropriate methods in the ORBInit Info argument 
passed to the ORB initializer by the CORBA: : ORB_init 
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Figure 12: An Efficient Flow Stack Implementation 


method. An example of this interceptor registration code fol- 
lows: 


// Code that would reside ina 

// concrete implementation of an 

// ORBInitializer::post_init() method, 
// example. 


for 


// Create and install a client interceptor. 
PortableInterceptor:: 
ClientRequestInterceptor_var 
interceptor = new 
Secure_Client_Request_Interceptor; 


// “info" is the ORBInitInfo argument. 
info->add_client_request_interceptor 
(interceptor.in ()); 


2. Two interceptors are successfully invoked at a starting 
interception point during a request. This corresponds to step 1 
in Figure 12. 


3. Each successful request interceptor invocation incre- 
ments the stack size by one, which results in a stack size of 
two. Stack element one corresponds to request interceptor 
one as stored in the ORB’s interceptor array. Similarly, stack 
element two corresponds to interceptor two in the ORB’s in- 
terceptor array. Again, a logical stack is in use here. This 
corresponds to step 2 in Figure 12. 


4. An ending interception point is invoked. 


5. Within the ending interception point, each of the inter- 
ceptors in the logical stack is invoked. Prior to invoking each 
interceptor, the stack size is decreased by one (step 4 in Fig- 
ure 12), effectively popping an interceptor off of the logical 
flow stack. Since only the first two interceptors were pushed 
on to the stack, only the first two of the three interceptors will 
be invoked (step 5 in Figure 12) in the ending interception 
point and the third interceptor will never be invoked. 


TAO’s logical flow stack implementation allows the 
CORBA general flow rule semantics to be implemented ef- 
ficiently and with minimal impact on ORB footprint. These 


benefits arise from the fact that flow stack storage is pre- 
allocated prior to the first use of the flow stack. In addition, 
the TAO implementation ensures the order of the interceptors 
stored in the ORB’s interceptor array remains unchanged for 
the lifetime of the ORB. 

One other aspect of this implementation is the fact that it is 
not necessary to acquire a lock to prevent other threads from 
modifying the logical stack. Only one thread ever services a 
request at a given time. Thus, there is no need to implement a 
locking mechanism for the logical stack, in which case addi- 
tional overhead is not incurred. 


4 Empirical Benchmarking Results 


Developers of distributed applications must often make trade- 
offs between time/space overhead and flexibility. Selecting 
which meta-programming mechanism to use, e.g., smart prox- 
ies or interceptors, is an example of this tradeoff. This section 
presents benchmarking results that quantify the time/space 
overhead and tradeoffs of using smart proxies and portable in- 
terceptors. 


4.1 Overview of the Testbed Environment and 
Benchmarks 


The experiments were conducted using a Bay Networks Lat- 
tisCell 10114 ATM switch connected to two dual-processor 
UltraSPARC-2s running SunOS 5.7. Each UltraSPARC-2 
contains 2 168 MHz CPUs with a 1 Megabyte cache per-CPU, 
256 Mbytes of RAM, and an ENI-155s-MF ATM adapter card 
that supports 155 Megabits per-sec (Mbps) SONET multi- 
mode fiber. The experimental testbed is shown in Figure 13. 
The benchmarking programs were compiled using the Sun CC 





ltra 2 Ultra 2 
Figure 13: Testbed for Meta-programming Mechanism 
Benchmarks 


5.0 compiler with all optimizations enabled. We conducted 
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two different benchmarks: one measured the performance of 
smart proxies and the other the performance of interceptors. 


4.1.1 Smart Proxy Results 


The overhead of calling an operation via a smart proxy is 
equivalent to calling the default proxy, /.e., it is the cost of 
a local virtual method call. Therefore, we designed our smart 
proxy benchmark to show how performance can be improved 
if smart proxies are used as acache to minimize the number of 
remote operations. Here is the IDL interface we used for this 
test: 


interface Broadway_Show 

{ 
// Get the prices for the box 
// seats of the Broadway show. 
short box_prices (); 


// Order tickets. 
long order_tickets {in short number) ; 


The servant in the test is a virtual box office that allows 
clients to purchase tickets to Broadway shows. A client can 
query the prices of box seats and if they are within a price 
range, it buys them. Thus, the client normally makes two in- 
vocations: (1) box_prices and (2) order_tickets if the 
prices are reasonable. By default, every time a client enquires 
about ticket prices, a remote invocation occurs. 

We can minimize overhead significantly by using a smart 
proxy that makes just one remote invocation and then caches 
the result and reuses it when subsequent enquiries occur. This 
caching improves the performance significantly, as shown in 
Figure 14. This figure illustrates that omitting unnecessary 








Figure 14: Performance Improvement Using a Smart Proxy to 
Cache Information 


remote operation calls improve the performance by ~130%, 
even over a high-speed ATM network. 
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4.1.2 Portable Interceptor Results 


Our portable interceptor benchmarks quantify the cost of sup- 
porting and using interceptors in TAO. Moreover, these tests 
quantified the costs of individual interceptor features, such as 
accessing a parameter list and accessing a service context list. 
In the benchmark program, the following three IDL operations 
were defined in the Secure_Vault interface: 


interface Secure_Vault 
{ 
exception Invalid {); 


struct Record { long check_num; long amount; }; 


// No args/exceptions operation. 
short ready (); 


// Throws a user exception. 
void authenticate (in string user) 
raises (Invalid); 


// updates a struct and returns a count. 
long update_records (in long id, 
in Record val); 
‘3 


Each operation takes a different number and different 
length of parameters and return values. Moreover, the 
authenticate operation throws a user exception, whereas 
the other two do not. This diversity allowed us to measure 
the cost of preparing different types of generic information re- 
quired by interceptors. 

The interceptor benchmarks were run using the five differ- 
ent configurations summarized below: 


1. No interceptor support: In this configuration, inter- 
ceptor support was disabled completely in the ORB, which 
measured TAO’s baseline performance. 


2. No interceptor installed: This time the ORB was com- 
piled with interceptor support, although the test was performed 
without installing an interceptor into the ORB. This configura- 
tion measures the performance penalty applications must pay 
for the potential of flexibility. 


3. No-op interceptor installed: This configuration uses a 
no-op interceptor to measure the cost of invoking interceptors. 


4. Accessing the service context list: The interceptor in- 
stalled in this configuration manipulates the GIOP request’s 
ServiceContextList. On the client, a request inter- 
ceptor creates a new ServiceContext containing an en- 
capsulated password string of 7 bytes and inserts the ser- 
vice context object into the ServiceContextList of 
the invocation using the RequestInfo interface. On the 
server, a different request interceptor performs the reverse 
operation by (1) extracting the password string from the 
ServiceContextList using the RequestInfo inter- 
face and (2) examining the password via a string comparison. 


USENIX Association 


5. Accessing Dynamic information: TAO implements 
the Dynamic module types in request/reply operations, such 
as parameters, results and exception list of an invocation, by 
creating these information on-demand. The interceptor in- 
stalled in this configuration accesses the dynamic information 
of the operations by checking their parameters and return val- 
ues . 


Figure 15 shows the cost of supporting and using these vari- 
ous features and configurations in interceptors. In the first con- 


Havthenticata | 


Throughput (events/sec) 






update_racords 
authenticate 
Operstions 


interceptor _ No-op 
installed — "terceptor 





Interceptor Types 


Figure 15: Cost of Using Various Interceptor Features 


figuration (no interceptor support), all three measured opera- 
tions perform similarly because there is no significant differ- 
ence between the information these operations exchange. The 
results are similar for the second configuration, which added 
interceptor support to the ORB but without installing any in- 
terceptors. There is only a ~9% performance penalty for using 
the ORB with interceptor support. 

The no-op interceptor provide the baseline cost of invoking 
an interceptor. There is ~26% of performance penalty com- 
pared to not installing the interceptor due to invocations of 
interception points on every operation invocation. As shown 
in Figure 15, however, all three operations reveal similar per- 
formance characteristics, regardless of the number and size of 
their parameters and return values. 

Similar performance degradation is also observed for inter- 
ceptors that access the ServiceContextList. This con- 
figuration measures the cost of adding and extracting a short 
string from the ServiceContext. Again, all three opera- 
tions experience ~8% degradation in performance compared 
to using the no-op interceptor. 

The interceptor that access the Dynamic module types, 
however, demonstrates more diversity in performance degra- 
dation among the three operations we tested. There are 
~7%, ~19%, ~and 40% performance hits to the ready, 


authenticate, and update_record operations, respec- 
tively, compared with no-op interceptor configuration. The 
performance penalty comes not only from the accessing pa- 
rameters using the Dynamic module types, but also from the 
on-demand creation of the dynamic information. The results 
show that the preparation of Dynamic module types are ex- 
pensive, which justifies our decision not to create them if they 
are not accessed by interceptors. 


4.2 Memory Footprint Results 


TAO is an open-source ORB that is used for real-time and em- 
bedded systems with memory constraints. Therefore, smart 
proxies and interceptors can be conditionally compiled in or 
out at ORB compile-time. To measure the memory increment 
necessary to support smart proxies and interceptors, we com- 
piled the Secure_Vault IDL interface shown above with 
three different operations using the following configurations: 


1. Interceptors and smart proxies disabled. 


2. Interceptors and the smart proxies both enabled; 


3. Interceptors enabled but smart proxies disabled, which is 
the default configuration in TAO; and 


4. Interceptors disabled and smart proxies enabled. 


Table 1 shows the resulting sizes for different configura- 
tions. Not counting the application-specific proxy and factory 


Supporting Stub | % Inc. | Skeleton 

ee 
Neither isi od 
Smartproxies |__1321| 25] 1a77| 0 || 
[Interceptors | __1479 | M48] 1485 | 163 | 
[Boh | si |_ 78 | 1489 | 16.6 | 


Table 1: Footprint Comparison for Smart Proxies and Inter- 
ceptors 














method, smart proxies increase TAO’s client memory footprint 
by ~2.5%. In contrast, interceptors require ~ 15% extra foot- 
print to handle on-demand creation of parameters lists, excep- 
tions list, etc. 

We also performed the same test using the OMG Minimum 
CORBA configuration [19], which defines a subset of the com- 
plete ORB CORBA specification to reduce embedded system 
memory footprints. By default, TAO’s Minimum CORBA 
footprint is less than | MB. To determine the footprint growth 
when smart proxies and/or interceptors are used, we measured 
the size of the ORB again using the same IDL interface, as 
shown in Table 2: The footprint increase for TAO’s smart 
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Table 2: Footprint Comparison for Smart Proxies and Inter- 
ceptors in TAO’s Minimum CORBA Configuration 


proxies in this configuration is 5.55% and the support for inter- 
ceptors causes a significant 20-23% increment. These results 
are not surprising since both these meta- programming features 
are new and have not yet been optimized for TAO’s Minimum 
CORBA configuration. 

In general, the results in this section show that CORBA 
meta-programming mechanisms can provide developers with 
significant improvements in functionality, performance, and 
convenience without drastic changes to existing application 
software. Depending on which features are used, however, 
developers need to consider the affect of time and space over- 
head. 


5 Related Work 


CORBA is increasingly being adopted as the middleware of 
choice for a wide-range of distributed applications and sys- 
tems. As systems evolve, new features/services will be added 
to the system. Smart proxies and interceptors are good ways 
to adapt existing applications to take advantage of these new 
features. The following work on middleware technologies is 
related to our research. 


QuO: The Quality Objects (QuO) distributed object mid- 
dleware is developed at BBN Technologies [20] by apply- 
ing Aspect-Oriented Programming (AOP) [21] techniques to 
adaptive network applications. QuO is based on CORBA and 
supports: 


1. Run-time performance tuning and configuration 
through the specification of operating regions, behavior 
alternatives, and reconfiguration strategies that allows the 
QuO run-time to adaptively trigger reconfiguration as sys- 
tem conditions change, represented by transitions between 
operating regions; and 

2. Feedback across software and distribution boundaries 
based on a control loop in which client applications and server 
objects request levels of service and are notified of changes in 
service. 


QuO achieves this functionality via customized smart prox- 
ies, called delegates, and embedded MOP interfaces within 


the proxies. However, their framework does not allow users to 
install user-defined proxies and the MOP interfaces are specif - 
ically designed for QoS purpose. 


Orbix filters: | Orbix defines the concept of filters, which are 
an interceptor mechanism based on the concept of “flexible 
bindings” [22]. By deriving from a predefined base class, de- 
velopers can intercept events. Common events include client- 
initiated transmission and arrival of remote operations, as well 
as the object implementation-initiated transmission and arrival 
of replies. Developers can choose whether to intercept the 
request or result before or after marshaling. Orbix program- 
mers can leverage the same filtering mechanism to build multi- 
threaded servers [23, 24, 25]. 


dynamicTAO: The dynamicTAO reflective ORB [26] sup- 
ports interceptors for monitoring and security. Particular in- 
terceptor implementations are loaded into dynamicTAO using 
the Component Configurator pattern [10]. Using component 
configurators to install interceptors in dynamicTAO allows ap- 
plications to exchange monitoring and security strategies at 
run-time. Moreover, there are extensive use of reflective pro- 
gramming technique in dynamicTAO to determine the module 
the ORB requires. 


Fault-tolerant ORB frameworks: Interceptors have been 
applied in a number of fault-tolerantORB frameworks such as 
the Eternal system [27]. Eternal intercepts system calls made 
by clients through the lower-level YO subsystem and maps 
these system calls to a reliable multicast subsystem. Eternal 
does not modify the ORB or the CORBA language mapping, 
thereby ensuring the transparency of fault tolerance from ap- 
plications. 


COM interceptors: Hunt and Scott [28] describe how to 
implement interceptors in COM. The concept they use to im- 
plement interceptors is similar to TAO’s collocated stub [15]. 
This techniqueuses alternative wrappers around the object im- 
plementation to masquerade as operation targets, which are 
similar to TAO’s smart proxies. 


6 Concluding Remarks 


Distributed object computing (DOC) middleware has been ap- 
plied widely to domains ranging from telecommunications to 
aerospace, process automation, and e-commerce. DOC mid- 
dleware shields developers from many distribution challenges 
and allows applications to invoke operations on target objects 
efficiently without concern for their location, programming 
language, OS platform, communication protocols and inter- 
connects, and hardware [29]. Historically, however, many 
DOC middleware solutions have tightly coupled interfaces and 
implementations, which makes it hard to adapt to requirement 
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or environment changes that occur late in an application’s life- 
cycle, i.e., during deploymentand/or at run-time. 


Meta-programming mechanisms are techniques that help in- 
crease the flexibility and adaptability of applications, with- 
out degrading performance significantly. This paper de- 
scribes two meta-programming mechanisms—smart proxies 
and interceptors— that we added recently to TAO, is an im- 
plementation of CORBA that is targeted for applications with 
high-performance and real-time QoS requirements. These two 
mechanisms allow CORBA applications to adapt to changing 
requirements or environmental conditions that occur late in an 
application’s life-cycle without requiring obtrusive changes in 
existing software. 


Based on our experience using smart proxies and intercep- 
tors to develop TAO applications, we have observed the fol- 
lowing tradeoffs and limitations with smart proxies and inter- 
ceptors: 


Performance: Interceptors incur more overhead than smart 
proxies because they influence the processing of operations at 
multiple points along the invocation path. The portable inter- 
ceptor results in Section 4.1.2 illustrate the overhead of sup- 
porting interceptors and the run-time costs of specific inter- 
ceptor features. 


In general, smart proxies perform better and consume less 
memory than interceptors. The smart proxy results in Sec- 
tion 4.1.1 show the circumstances where using smart proxies 
can improve performance. Even thought there is an extra layer 
of indirection, the overall performance can be improved by 
removing the gratuitous overhead of unnecessary remote in- 
vocations. 


Generality: Interceptors can be applied to either servers or 
clients and can access operation-specific information. There- 
fore, they provide an effective meta- programming mechanism 
to handle advanced features, such as authentication and autho- 
rization, transparently end-to-end. In contrast, smart proxies 
only apply to specific interfaces accessed by clients. In par- 
ticular, smart proxies can only influence the behavior at the 
beginning of an invocation. 


Portability: Smart proxies are not currently part of the 
CORBA standard. Although many ORBs provide smart prox- 
ies as extensions, this feature is not portable. There is, how- 
ever, a Portable Interceptors specification [7] that is being rat- 
ified by the OMG. 


All the source code, documentation, and tests for 
TAO are open-source and can be downloaded from 
www.cs.wustl.edu/~schmidt/TAO. html. 
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Kava - Using Byte code Rewriting to add Behavioural Reflection to Java 
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Abstract 


Many authors have proposed using byte code rewriting as a way of adapting or extending the behaviour of Java 
classes. There are toolkits available that simplify this process and raise the level of abstraction above byte code. 
However, to the best of our knowledge, none of these toolkits provide a complete model of behavioural reflection 
for Java. In this paper, we describe how we have used load-time byte code rewriting techniques to construct a run- 
time metaobject protocol for Java that can be used to adapt and customise the behaviour of Java classes in a more 
flexible and abstract way. Apart from providing a better semantic basis for byte code rewriting techniques, our ap- 
proach also has the advantage over other reflective Java implementations that it doesn't require a modified com- 
piler or JVM, can operate on byte code rather than source code and cannot be bypassed. In this paper we describe 
the implementation of Kava, our reflective implementation of Java, and discuss some of the linguistic issues and 
technical challenges involved in implementing such a tool on top of a standard JVM. Kava is available from 
http://www.cs.ncl.ac.uk/research/dependability/reflection. 


1. Introduction 


Many authors have considered the problem of reusing 
third party code in environments the developers did not 
originally consider [1, 2, 3]. For example, some pro- 
posals suggest ways to apply access control policies to 
code that has been developed without any thought for 
security [3]. Wrapping was originally proposed as a 
technique to enable adaptation of the code but it suffers 
from a number of problems such as identity confusion, 
or the self problem [1] etc. A solution to the problem is 
transform the code at the binary level [4]. This has 
proved to be a practical technique in the context of 
Java because the Java byte code retains a large amount 
of semantic information. 


A number of byte code rewriting tools have been de- 
veloped to ease the process of code rewriting. These 
include JOIE [5], Byte Code Engineering Library [6] 
and more recently Javassist [7]. Each toolkit provides 
object oriented representations of the structure of 
classes that can be used to rewrite classes on-the-fly. 


The focus of these toolkits is on implementing changes 
to the behaviour of classes through programs that re- 
write the class implementations. Users typically have to 
write programs that walk class structures and locate the 
appropriate places to make changes to the structure in 
order to implement some change to runtime behaviour. 
The actual implementation of changes this way is diffi- 
cult for most programmers and highly error prone. 


We argue that for applications where non-functional 
concerns are being implemented (security, transac- 
tions, debugging etc.) it would be more natural to spec- 
ify changes to the behaviour of classes in terms of run- 
time abstractions. 


For example, in order to trace state changes it would be 
more natural to redefine the runtime state access opera- 
tion rather than manually write a program that walks 
all the methods of a class file and instruments pertinent 
field access operations. 


Metaobject protocols and reflection are a good model 
for expressing such changes. Metaobject protocols pro- 
vide abstractions of the runtime environment, and ex- 
pose the protocols governing the execution in the run- 
time environment. Reflection means that changes to 
the implementation of these metaobject protocols will 
change the way in which code is executed at runtime. 


We have implemented a highly portable implementa- 
tion of a behavioural reflection for Java called Kava 
[8]. It provides a metaobject protocol for specifying 
changes to runtime behaviour and implements these 
changes through the use of structural rewriting toolkits 
such as JOIE, Byte Code Engineering Library, or 
Javassist. It is portable, is written entirely in Java, and 
unlike a number of other reflective Java implementa- 
tions doesn't require a specialised Java Virtual Ma- 
chine. Kava also provides support for properties such 
as strong non-bypassability and 
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public class TraceMethod implements Constants { 
private static String class_name; 
private static ConstantPoolGen cp; 
private static int out; // reference to System.out 
private static int printlin; // reference to PrintStream.println 


private static Method traceMethod(Method m) { 
Code code m.getCode(); 
int flags m.getAccessFlags()j; 
String name m.getName () ; 


// Create instruction list to be inserted at method start. 
String mesg = "tracing " + m.getMethodName() ; 
InstructionList patch = new InstructionList() ; 
patch.append(new GETSTATIC (out) ); 

patch.append(new PUSH(cp, mesg) ); 

patch.append(new INVOKEVIRTUAL (println) ); 


MethodGen mg new MethodGen(m, class_name, cp); 
InstructionList il mg.getInstructionList(); 
InstructionHandle[] ihs il.getInstructionHandles(); 


// First let the super or other constructor be called 
if(name.equals("<init>")) { 
fom(Ant j=1;° 3 -< ihs..length; 3++) { 
if(ihs[j].getInstruction() instanceof INVOKESPECIAL) { 
il.append(ihs[j], patch); // Should check: method name == "<init>" 
break; 
} 
} 
} 
else 
il.insert(ihs[0], patch); 


// update stack size 
if(code.getMaxStack() < 2) 
mg.setMaxStack(2) ; 


return mg.getMethod() ; 
} 


public static void main(String[] argv) { 
JavaClass java_class = new ClassParser(argv[i]).parse(); 
ConstantPool constants = java_class.getConstantPool(); 
cp = new ConstantPoolGen(constants) ; 
out = cp.addFieldref("java.lang.System", "out", 
"Ljava/io/PrintStream;"); 
println = cp.addMethodref("java.io.PrintStream", 
“printin", 
"“(Ljava/lang/String;)V"); 


Method[] methods = java_class.getMethods(); 
for(int j=0; j < methods.length; j++) 
methods[j] = traceMethod(methods[j]); 


java_class.setConstantPool(cp.getFinalConstantPool()); 
java_class.dump(class.getClassName()+".class") ; 





Figure 1 — Tracing Method Execution 
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reflection on inherited methods that other reflective 
Java implementations do not address. 


In section 2 we discuss byte code rewriting and its 
shortcomings, in section 3 we introduce the Kava sys- 
tem, in section 4 we provide some examples of its ap- 
plication, in section 5 we discuss the implementation of 
Kava, in section 6 we provide an overview of related 
work and finally in section 7 we give our conclusions 
and outline future work. 


2. Bytecode Rewriting 


There are three main toolkits for rewriting bytecodes: 
Joie, Byte Code Engineering Library and Javassist. The 
first two toolkits provide object oriented frameworks 
for writing programs that manipulate the structure of 
class files. They provide loadtime representations of 
elements of class files such as methods, types, instruc- 
tions etc. Java programs can then be written that de- 
scribe how class files can be rewritten as late as load 
time. The main drawback with this approach is that the 
programmer has to have a detailed understanding of 
both the structure of class files and Java virtual ma- 
chine programming. As the authors of Joie have ob- 
served, this makes it difficult for programmers to write 
reliable and easily understandable transformer pro- 
grams. Javassist attempts to address this problem by 
providing a metaobject protocol for the rewriting of 
byte codes. It allows a programmer to work at a more 
abstract level. However, it sacrifices some of the power 
of the other toolkits without gaining a high enough 
level of abstraction. Also, it still requires the pro- 
grammer to think in terms of reprogramming an exist- 
ing implementation. 


Figure 1 shows how the Byte Code Engineering Li- 
brary can be used to trace method execution of a class. 


This code adds a print statement at the start of each 
method. The traceMethod method generates the 
appropriate byte code for a print statement. While the 
main method traverses the structure of the class to lo- 
cate the appropriate place to insert the instructions and 
finally ensure that the stack size after insertion is cor- 
rect. 


This process is obviously difficult for novice program- 
mers to learn and is error prone. It is difficult as the 
code to be inserted is developed by hand and the pro- 
grammer must manually add the appropriate entries to 
the constant pool. It is error prone because there is no 
separate type checking available for the code to be in- 
serted. In addition to writing the code to be inserted the 


code for performing the insertion also has to be written 
from scratch every time and issues such as ensuring 
that the stack size is maintained correctly have to be 
addressed by the programmer. 


To address these concerns, two improvements are 
needed: 


e = The ability to write the behavioural modifications 
in Java, and to be able to compile and verify these 
modifications as you would a normal class. 


e The ability to declaratively specify where the be- 
havioural modifications should be applied. 


Kava provides these improvements. Behavioural adap- 
tations are implemented using metaobject classes that 
can be compiled and verified, and the application of 
the metaobjects is driven by a binding specification 
that uses a declarative binding language. 


3. Using Kava 


In this section we introduce the basic concepts of be- 
havioural reflection, and describe how Kava is actually 
used. 


3.1. Behavioural Reflection 


Reflection [9] is the process by which a system can 
reason about and act upon itself. A reflective system is 
composed of a base level and a meta level. The base 
level is the system being reasoned about, and the meta 
level has access to representations of the base level. 
Reification is the process by which the abstract repre- 
sentations of the base level are generated. A reflective 
system has the property that the meta level is causally 
connected to the base level. This means that changes at 
the meta level cause changes to the behaviour of the 
base level. 


These notions of reflection have been extended to in- 
clude the concept of the metaobject protocol [10] 
where an abstraction of the computation process and 
the protocols governing the execution of the program 
are exposed. A metaobject is bound to an object and 
controls the execution of the object. By changing the 
implementation of the metaobject the object's execution 
can be adjusted in a principled way. The protocols are 
implemented as methods of the metaobject. 


Reflection and metaobject protocols have been success- 
fully used to implement non-functional properties such 
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public interface IMetaObject { 


public void beforeExecuteMethod(IExecutionContext context) 
public void afterExecuteMethod(IExecutionContext context) ; 
/* called when a method is executed (including constructor/finalizer) */ 


public void beforePutField(IFieldContext context) ; 
public void afterPutField(IFieldContext context) ; 


/* called when a field is accessed */ 


public void beforeGetField(IFieldContext context) ; 
public void afterGetField(IFieldContext context) ; 


/* called when a field is read */ 


public void beforeInvoke(IInvocationContext context) ; 
public void afterInvoke(IInvocationContext context) ; 
/* called when a method is invoked (including initialser */ 


public void beforeException(IExceptionContext context) ; 
public void afterException(IExceptionContext context) ; 
/* called when an exception is thrown and caught */ 





Figure 2 — Interface for Kava MetaObject 


as concurrent programming [11], atomic data types 
{12], fault tolerance [13], and security [14]. 


The Java programming language [15] includes a 
reflection package. This provides the ability to reify 
some aspects of the Java runtime environment such as 
methods, classes, fields, etc. and allows dynamic 
construction of proxies and dynamic method 
invocation. However, it does not provide the ability to 
modify the behaviour of an application through 
changes at a meta level. Kava provides powerful 
behavioural reflection without requiring changes to the 
Java Virtual Machine or requiring the use of source 
code preprocessing. It implements behavioural 
reflection through the principled rewriting of Java class 
files. 


The Kava system allows each object or class to be 
bound to a metaobject. At the meta level runtime be- 
haviours such as method invocation, method execution, 
field access, etc. can be redefined by the metaobject 
implementation. The metaobject implementation is 
constructed using reified aspects of the runtime object 
model. For example, a method is reified as an instance 
of a Method class. 


The binding itself is described by a binding specifica- 
tion. This is written using a declarative binding lan- 
guage. Separating the binding information from the 
metaobjects increases the reusability of metaobjects as 
the bindings effectively parameterise the metaobjects. 
For example, a binding specification may bind a 
metaobject to different fields on different classes. 


3.2 Using Kava 

Each metaobject is an implementation of the interface 
IMetaObject. This defines a series of methods for 
intercepting and customising various aspects of the 
runtime behaviour of an object. See Figure 2 for the 
interface. 


Each method has a before and after variant. The before 
methods are invoked before the behaviour, and the 
after methods are invoked after the behaviour. Each 
time a metaobject’s method is invoked the behaviour’s 
context is reified as an instance of a context object and 
passed as an argument. This makes the context acces- 
sible to the metaobject implementation. Some aspects 
of the context can be changed at the metalevel, such as 
the actual arguments passed to a method. On return to 
the base level the context object is converted back to 
the actual context of the behaviour. 


Each before method can set the context such that the 
base level behaviour is overriden. This means that the 
base level behaviour will be suppressed. For example 
setting an override in a beforeExecuteMethod 
will result in the body of the method not being exe- 
cuted. 


An example of a metaobject that implements the trac- 
ing of method executions similar to the example given 
in section 2 is: 
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public class MetaTrace 
implements IMetaObject { 
public void 
beforeExecuteMethod ( 
IExecutionContext context) { 
System.out.printin{ 
“Eracing? "sk 
context.getMethodName({ )); 
} 
} 
In order to trace the methods of a particular class, it is 


necessary to establish a binding between instances of 
the class and instances of the MetaTrace class. 
These bindings are described using the Kava binding 
language in a special metaconfiguration file that drives 
the processing of a class by Kava. The binding specifi- 
cation shown below means that MetaTrace inter- 
cepts the execution of any method of the class Test. 


<binding> 
<class> 
<classname>Test</classname> 
<metaclass>MetaTrace</metaclass> 
<intercept> 
<execute> 
<method>*</method> 
<parameters>*</parameters> 
</execute> 
</intercept> 
</class> 
</binding> 


If the implementation of Test is: 


public class Test { 
public static 
void main(String[] args) { 
(new Test) .run(args[0]); 
} 
public void run(String s) { 
System.out.println(“hello ” + s); 
} 
} 


Then output of invoking the run method of Test with 
the actual parameter Worldis: 


tracing run 
hello World 


Note that the code necessary to implement tracing be- 
haviour is significantly more concise than the equiva- 
lent byte code transformation code. The metaobject that 
specifies the code to be invoked when a method is exe- 
cuted can also be compiled and verified therefore re- 
ducing the possibility of coding errors. The binding 
specification is significantly shorter than the code that 
traverses the class and inserts instructions at the ap- 
propriate place. Also, since it is a declarative specifica- 
tion it is easier to code and less likely to contain errors. 


Kava is well suited to modifying the behaviour of 
classes where the interface of the class is not to be 
changed, or new keywords to be added to the language. 
As this example shows it is far more concise than an 
equivalent byte code transformation program, and it 
separates out the adaptation code (the metaobject) and 
the specification of where to apply the adaptation (the 
binding). 


4. Examples 

This section shows applications of Kava that highlight 
some of the more unusual features of the Kava metaob- 
ject protocol. Many implementations of reflective Java 
concentrate on intercepting method calls and tracing 
method calls is the standard example used to demon- 
strate a reflective system. Kava provides the ability to 
intercept the sending of method calls (invocation), field 
access, and exception handling in addition to the inter- 
ception of method calls. The first example given here 
is of fine grained access control, this illustrates Kava's 
ability to control field access. The second example 
given here is how to prevent a particular type of denial 
of service attack, this illustrates Kava's ability to inter- 
cept the sending of method calls. 


4.1 Fine grained access control 

The Java programming language provides the follow- 
ing language level mechanisms for controlling access 
to class members such as methods or fields: 


e Public access where code belonging to any class is 
allowed to access the member. 


e Package access where access to the member is 
permitted only to code belonging to classes in the 
same package. 


e Protected access where access to the member is 
permitted only to code that inherits from the func- 
tionality of the class. 


e Private access where access to the member is per- 
mitted only to code that occurs in the body of the 
top level class that encloses the declaration of the 
member. 


While this is adequate for a number of situations there 
is still the possibility that a more fine grained access 
control may be required for security purposes. For ex- 
ample, we may only want a certain field to be accessed 
by a limited number of classes that are spread across 
multiple packages. 
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Using Kava it is relatively simple to implement such a 
fine-grained scheme. In this example we focus on pre- 
venting access to fields by any but a small number of 
classes. 


We implement the following protection metaobject 
MetaChkAccess that restricts access to a field to 
instances of two known classes GoodGuyA and 
GoodGuyB: 


public MetaChkAccess implements 
IMetaObject { 


public void 
beforePutField(IFieldContext c) { 
checkAccess (c.getBase()); 
} 
/* check any writes to the field */ 


public void 
beforeGetField(IFieldContext c) { 
checkAccess(c.getBase ()); 


} 
/* check any reads from the field */ 


public void 
checkAccess(Object who) { 
if (who instanceof GoodGuyl | | 
who instanceof GoodGuy2) { 
return; 
} 
else { 
// wrong class 
throw new 
SecurityException ( 
"jllegal access by " + 
who.getClass() .getName() ) ; 
} 


/* check whether access is allowed */ 
} 


The checkAccess method checks any access to a 
field. If a class other than one of the allowed classes 
attempts to access a field then a SecurityExcep- 
tion is thrown. As SecurityException is a 
subclass of RuntimeException it does not have to 
be included in the declaration of the method. 


The MetaChkAccess metaobject is then bound to 
any class that reads from or writes to the field pro- 
tectedField of the class ProtectedClass by 
including the following in the binding specification: 


<binding> 
<class> 
<classname>*</classname> 
<metaclass>MetaChkAccess</metaclass> 
<intercept> 
<getfield> 
<class>ProtectedClass</class> 
<field>ProtectedField</field> 


</getfield> 
<putfield> 
<class>ProtectedClass</class> 
<field>ProtectedField</field> 
</putfield> 
</intercept> 
</class> 
</binding> 


4.2 Denial of Service 

Denial of service attacks are trivial to implement in 
Java. The simplest attacks consume resources by gen- 
erating infinite numbers of objects such as windows 
that fill up the user's screen and occupy CPU time. 
Trivially this could be dealt with by defining a 
MetaRsrcelmt that watches how many instances of 
windows (all subclasses of java.awt.Frame) are 
created and limiting the number that can be created to 
a maximum: 


public MetaRsrceimt implements 
IMetaObject 
{ 
public void 
beforeInvoke (IInvocationContext c) 
{ 
if (c.getTarget() instanceof 
java.awt.Window && 
c.getMethod ().equals("<init>") ) 


maxCount++; 
if (maxCount > ARBITARY_MAX) 
{ 
throw new 
Runt imeException ( 
"exceeded max number of frames"); 
} 


} 


This metaobject then would be bound to all method 
invocations by any method of any class: 


<binding> 
<class> 
<classname>*</classname> 
<metaclass>MetaRsrceiImt</metaclass> 
<intercept> 
<invoke> 
<method>*</method> 
<parameters>*</parameters> 
<class>*</class> 
<targetmethod> 
<init/> 
</targetmethod> 
</invoke> 
</intercept> 
</class> 
</binding> 
A more sophisticated metaobject will allow the block- 
ing of windows until one was destroyed, and would 
maintain a global count of windows and detect when 


windows were destroyed as well as created. However, 
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this example shows that resource creation can be easily 
controlled using a reflective approach. 


5. Kava Implementation 


5.1 Architecture 

Kava is written purely in Java, it does not require any 
special Java Virtual Machine to work. The link be- 
tween metaobjects and objects is realised by the rewrit- 
ing of classes and addition of hooks into the class code. 
Figure 3 shows the Kava architecture. A classloader 
reads the class file as a stream of bytes. These can be 
retrieved from any source, normally from a file or from 
across the network. The classloader parses the byte 
stream and creates a JVM specific representation of a 
class. Normally this is passed to the verifier before it is 
instantiated by the JVM. However, Kava is used to 
intercept the byte stream before the classloader con- 
structs the JVM specific class and applies the standard 
code transformations that realise control by metaob- 
jects. As stated earlier Kava uses a binding specifica- 
tion file to determine what behaviours of what classes 
are to be brought under the control of particular 
metaobjects. It then adds traps into the code of the 
class to switch control when from the base level to the 
meta level (the associated metaobject) when the byte 
code base level objects carry out certain behaviours. 
After rewriting the class to include these traps, the 
classloader passes an internal representation of the 
class to the byte code verifier as before. This means 
that properties such as type safety are still honoured as 
before. 


Note that the metaobjects are loaded by the classloader 
in exactly the same way as any other class, which 
means that they must satisfy the same security proper- 
ties as any ordinary Java class. Metaobjects are ordi- 
nary Java classes and can be compiled which means 
that errors can be caught at an early stage. 


Kava can be invoked either after a class is compiled or 
at the time the class is loaded into the JVM. In order to 
invoke Kava at loadtime a user-defined classloader 
must be used. In either case the traps that are added to 
the class are non-bypassable. 


5.2 Instrumentation 

The Kava metaobject protocol is implemented using 
the technique of byte code rewriting. Kava makes use 
of the Byte Code Engineering Library [6] toolkit to 
implement the standard transformations that add the 
hooks necessary to switch control from the baselevel to 
the meta level at runtime. Using a standard byte code 
rewriting toolkit frees us from dealing with technical 


details such as maintaining relative addressing when 
new byte codes are inserted into a method, or determin- 
ing the number of arguments a method supports before 
it has been instantiated as part of a class. 


Class File 
byto stream 
Class loader Metaobject Class Fite 
dass (fe srudue 
Binding Specification 
ius File 


Gass tle structue 


Verifier 


ga et Interpreter = C= | 


Runtime | Fomine sys0m | 





Figure 3 - Kava Architecture 


Standard byte code rewritings are used to add hooks for 
individual methods and individual byte code instruc- 
tions. These hooks reify the context of a behaviour that 
is being trapped, invoke the metaobject associated with 
an object and reflect any changes to the context back to 
the base level. The metaobjects that are invoked are 
completely separate from the byte code hooks and are 
developed entirely in Java. This separation means that 
the runtime meta level can be adjusted dynamically at 
runtime although which behaviours are trapped is de- 
termined at loadtime. 


For example, returning to the example of section 3.2 
the class Test included the following run method: 


public void run(String s) { 
System.out.println("hello " + s); } 


After the metaobject MetaTrace is bound to Test 
using the binding specification presented earlier, the 
run method is effectively rewritten by Kava as: 


public void run(String s) { 

Context c = new Context 
(this, "run", "void", 
"java.lang.String", new Object [] 
{s}); 

getMeta() .beforeExecuteMethod(c) ; 

if (!c.override()) { 

System.out.println("hello " + 
(String)c.getArg(0)); 
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getMeta() .afterExecuteMethod(c) ; 
} 
} 


The code in bold has been added by Kava. 


First, at the beginning of the method block a context 
object that represents the invocation frame is created. It 
contains a pointer to the base level object itself, the 
name of the method being executed, the return type, 
the types of the parameters, and the actual parameters 
marshalled into an array of objects. Then the metaob- 
ject associated with the base level object is retrieved 
using a method added earlier by Kava and the be- 
foreExecuteMethod method invoked. In this case 
getMeta() returns a pointer to an instance of a 
MetaTrace so the method name is printed out. Fol- 
lowing the invocation of beforeExecuteMethod 
the arguments passed within the context object are un- 
packed, and the base level code is invoked. Finally, at 
the end of the method block the afterExecute- 
Method method of the associated metaobject is in- 
voked. In this case there was no implementation of the 
method so nothing occurs at the meta level. 


Since a metaobject method may override the corre- 
sponding base level behaviour we add an if ... then 
clause. This ensures that when an override is indicated 
then the base level behaviour is suppressed. 


This is an example of standard transformation for a 
block of code. The transformation for intercepting be- 
haviour such as setting the value of a field is very simi- 
lar but finer-grained with the hook code being around a 
single instruction. 


5.3 Binding language 

As explained in section 5.1 the binding specification 
file determines where Kava introduces the hooks into 
the base level code. The concept is that to make 
metaobjects more reusable the binding should be speci- 
fied completely separately of both the base and meta 
level. 


The binding specification contains multiple base object 
and metaobject class bindings. Each binding is between 
one class and a metaobject class. For that binding the 
particular behaviours to be brought under the control of 
the metaobject are specified, for example the execution 
of methods, or the setting of fields. These are param- 
eterised by information such as the name of the field or 
method, the type of the target (in the case of setting a 
field, or invoking a method) etc. 


5.4 Special Features 

In this section we give an overview of some of the spe- 
cial features supported by Kava: strong encapsulation, 
reflection on inherited methods, exception handling 
and context objects. 


5.4.1 Strong encapsulation 

One of the benefits of the Kava implementation is its 
support for strong encapsulation. Strong encapsulation 
is the property that it is difficult to bypass the metaob- 
ject bound to the base level object. This has bccn 
achieved by avoiding the use of a separate wrapper 
class. Since hooks are added directly into method bod- 
ies we greatly reduce the possibility that the hooks 
could be bypassed. This is because there is no way to 
express in the Java language a branching to an arbi- 
trary point in a method body. 


It is true that if a malicious code transformer rewrote a 
class file that was pre-processed by Kava then our 
hooks could be removed. However, this can be easily 
guarded against through the use of a mixture of operat- 
ing system protection and the use of code signing tech- 
niques. 


5.4.2 Inherited Methods 

When a Java class inherits a method from its super- 
class the bytecode implementing the method is not re- 
produced in the implementation of the class. For ex- 
ample, if class C inherits the run method from class D 
then the byte code for run is still to be found in D's 
class file not C's class file. If we bind C to a metaobject 
metaclass MC, and try and bring the execution of run 
under the control of MC, Kava will fail because it can- 
not find the byte code implementation of run. The ob- 
vious answer is to bind MC to D as well and add the 
hooks into D's run method. However, we may not want 
D to be brought under the control of MC, indeed we 
may even want it to be bound to an entirely different 
metaclass. 


The answer to this problem is to add a method get- 
Meta to each base level class that returns a pointer to 
the metaobject bound to the base level object. We then 
ensure that superclass methods inherited by classes that 
are bound to a metaobject have hooks added to them 
that use the getMeta method to determine which 
metaobject to invoke. 


When an instance of C has its inherited method run 
invoked the JVM’s dynamic resolution of method calls 
will mean that the getMeta method appropriate to C 
will be invoked. This means that the metaobject bound 
to C will be returned. 
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When an instance of D had its method run invoked, 
the JVM's dynamic resolution of method calls will 
mean that the getMeta method appropriate to D will 
be invoked. This means that the metaobject bound to D 
will be returned. 


This approach ensures that the correct metaobject is 
invoked in both cases. The approach taken here is 
similar to that found in [3]. 


5.4.3 Exception Handling 

Kava allows the raising and throwing of exceptions to 
be intercepted and handled by the metaobject bound to 
an object. The beforeException method is in- 
voked before an exception is thrown at the base level. It 
allows the exception throwing to be overridden, this 
might be necessary where the metaobject is implement- 
ing distribution at the meta level and the exception has 
to be propagated to a remote client. The afterEx- 
ception method is invoked after an exception has 
been thrown or raised at the base level. It doesn't allow 
overriding of this behaviour but does allow additional 
processing to take place such as the propagation of the 
exception to related objects if a number of objects are 
co-operating and need to be aware of each other's 
status. 


5.5 Context Objects 

Kava uses the concept of Context objects to simplify 
the metaobject protocol and also allow the possibility of 
lazy reification. The metaobject sometimes will need 
access to runtime instances of Method, Class or 
Field. However, generating these is a relatively ex- 
pensive process so we defer their creation by passing 
the minimum information needed to derive them in a 
Context object. As part of the Context interface 
we provide methods for generating the reified in- 
stances. The standard Java reflective API is used to 
generate these instances. In the future we would like to 
apply the same technique to the actual parameters 
passed to the metaobject. 


5.6 Performance 

We have made some preliminary measurements of the 
performance of Kava. They indicate that the most ex- 
pensive operation is the generation of the context. 
Presently, this expense is more than doubling the exe- 


cution speed of a number of instructions. We are cur- 
rently exploring two main approaches to improving 
performance. The first approach is to use caching of 
context information, and the second is to allow selec- 
tive reification. 


6. Related Work 

In this section we briefly review a number of other re- 
flective Java implementations and attempt to categorise 
them according to the point in the Java class lifiecycle 
that reflection is implemented. 


The Java class lifiecycle is as follows. A Java class 
starts as source code that is compiled into byte code, it 
is then loaded by a class loader into the Java Virtual 
Machine (JVM) for execution, where the byte code is 
further compiled by a Just-In-Time compiler into plat- 
form specific machine code for efficient execution. 


Different reflective Java implementations introduce 
reflection at different points in the lifecycle. The point 
at which they introduce reflection tends to characterise 
the scope of their capabilities. In order to bring the 
base level under the control of the meta level the base 
level system is modified through the addition of traps. 
These traps are known as meta level interceptions [16]. 
For example, in Reflective Java method calls sent to 
the base object are brought under control of an associ- 
ated metaobject by trapping each method call to the 
baseobject. This is done by pre-processing the source 
code of the base level class. A contrasting example is 
MetaXa where the traps are in the implementation of 
the dispatch mechanism of the Virtual Machine. As the 
traps exist in the Virtual Machine itself, the source 
code of classes to be made reflective is not required. 
However, unlike Reflective Java, a specialised JVM 
must be used. 


Table 1 summarises the features of various reflective 
Java implementations. All these implementations have 
drawbacks that make them unsuitable for use with 
compiled components or in a standard Java environ- 
ment where the purpose is to add security. Some re- 
quire access to source code, and others are non- 
standard because they make use of a modified Java 
platform. 
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oie gh Ye 
Java 


Orem tn 
Lifecycle 


Description 











Source Reflective | Preprocessor. 

Code Java [17] 

Compile | OpenJava | Compile-time metaobject 
Time [18] protocol. 








Bean Ex- 
tender 
[19], 
Dalang 


[20], 


Byte code preprocessor 
(Bean Extender), byte code 
rewriting as late as load 
time (Dalang, JavaAssist). 


JavaAssist 


[7] 


Runtime | MetaXa Reflective JVMs. 
[21], 
Rjava [22], 
Guarana 
[23] 
java.lang. | Reflective capabilities part 
reflect [15] | of the standard Java devel- 
opment kit. 
Just-in- OpenJIT Compile-time metaobject 
time [24] protocol for compilation to 
Compila- machine language. 
tion 





roy Loy TCS Restrictions 









Dynamic switching _ of 
metaobjects. Intercept 
method invocations. 


Can’t make a compiled 
class reflective, requires 
access to source code. 


Can intercept wide range 
of operations, and extends 
language syntax. 


Requires access to source 
code. 


Bean Extender — _ re- 
stricted to Java Beans. 
Dalang, and Javassist — 
limited capabilities. re- 
quires offline preprocess- 
ing. 


No need to have access to 
source code. 


Can intercept wide range | Custom JVM. 

of operations. Can be dy- 

namically applied. 

Runtime introspection, | Overall introspection 


rather than behavioural or 
structural reflection. 


dynamic dispatch and on- 
the-fly generation of prox- 
ies. 


Custom Just-in-time com- 
piler. 


Can take advantage of fa- 
cilities present in the native 
platform. No need for ac- 
cess to source code. Dy- 
namic adaptation. 


Table 1 - Reflective Java Implementations 


In contrast, Kava does not require access to source 
code because it is based on byte code rewriting, doesn't 
require a non-standard Java environment and provides 
a rich set of capabilities. It also provides what we refer 
to as strong encapsulation. Most implementations add 
traps through renaming of classes, or renaming meth- 
ods, which means that it may be possible to call the 
original methods and therefore bypass the meta layer. 
Kava actually adds the traps directly into the method 
bodies avoiding this problem. Dalang was an earlier 
implementation of a loadtime reflective Java we im- 


plemented that suffered from this problem. See [20] for 
an account of the evolution of Kava from Dalang. 


The closest reflective Java to Kava is a behavioural 
reflection add-on implemented as a demonstration of 
the capabilities of JavaAssist, a byte code rewriting tool 
based on structural reflection. Like Kava, this add-on 
adds hooks to the classes using byte code rewriting and 
has a similar meta level architecture of a binding be- 
tween an object and a metaobject. However, it does not 
provide reflection on static members, on method invo- 
cation, or exception raising. Also it doesn't support 





6th USENIX Conference on Object-Oriented Technologies and Systems 


USENIX Association 


USENIX Association 


reflection on methods that have been inherited from a 
superclass, nor does it support the concept of a binding 
specification. 


7. Conclusions and Future Work 

Kava focuses on the behavioural changes programmers 
want to impose on third-party code instead of the 
messy structural changes that byte code transformation 
tools deal with. Kava allows adaptations to be devel- 
oped, compiled and tested independently of the target 
code, then declaratively combined with the target code. 
This reduces the chance of error and makes that task of 
adapting the behaviour of third-party code more tracta- 
ble. 


Kava implements behavioural reflection in Java using 
byte code transformation as the underlying technique. 
This approach has allowed the creation of a tool unlike 
other reflective Java implementations is portable and 
can bring reflect on a wide range of runtime behav- 
iours. 

Kava is available for download from 
http://www.cs.ncl.ac.uk/research/dependability/ 
teflection. We are currently in the process of tuning the 
implementation to support lazy reification of context 
objects. We are also investigating the application of 
Kava to a case study based on flexible security for an 
enterprise modelling system. 
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Abstract 


This paper presents a pragmatic way of implement- 
ing content-based publish/subscribe in a strongly 
typed object-oriented language. In short, we 
use structural reflection to implement filter ob- 
jects through which applications express their sub- 
scription patterns. Our approach is pragmatic in 
the sense that it alleviates the need for any spe- 
cific subscription language. It preserves encapsu- 
lation of message objects and helps avoiding er- 
rors. We illustrate our approach in the context of 
Distributed Asynchronous Collections (DACs), pro- 
gramming abstractions for message-oriented inter- 
action. DACs are implemented in Java, whose in- 
herent reflective capabilities fully satisfy the require- 
ments of our content-based subscription scheme. 
Our approach is however not limited to the context 
of DACs, but could be put to work easily in other 
existing event-based systems. 


1 Introduction 


Publish/subscribe in perspective. The im- 
portance of flexible, well-structured, but especially 
scalable communication mechanisms has been dras- 
tically increasing in the last decade. Applications 
tend to become very dynamic, i.e., components are 
not always up and are not locality-bound. These 
constraints visualize the demand for more flexible 
communication models, reflecting the nature of to- 
morrows applications. The publish/subscribe inter- 
action style has proven its ability to fill this gap. 
Based on the concept of information bus [OPSS93]}, 
publish/subscribe promotes the decoupling of par- 


*This work is partially supported by Agilent Laboratories 
and Lombard Odier & Co. 


ties in time as well as space:! consumers subscribe 
to the information bus by specifiying the nature of 
the information they are interested in, and produc- 
ers publish information on that bus. 

The classical topic-based or subject-based pub- 
lish/subscribe style involves a classification of 
the information by introducing group-like notions 
[Pow96], and is incorporated by most industrial 
strength solutions, e.g., [Cor99, TIB99, Ske98, 
AEM99]. Topics are however static and allow only 
a limited expressiveness [Car98]. More recently, re- 
search efforts have been targeted towards content- 
based (property-based [RW97]) publish/subscribe 
schemes [Car98, SA97, BCM+99]. This more flexi- 
ble variant removes entirely the “arbitrary” division 
of the information space, and lets consumers delin- 
eate their individual interests by expressing proper- 
ttes of messages they wish to receive. 


Current practice. Common event-based sys- 
tems relying on the content-based publish /subscribe 
paradigm equate properties of messages to attributes 
of those messages. In most cases, a subscription lan- 
guage is used to express ranges of values for those at- 
tributes, which violates object encapsulation: a sub- 
scription pattern ? expressed with such a subscrip- 
tion scheme exposes the message’s state, and the 
resulting filter queries messages by accessing their 
attributes. Furthermore, subscription languages can 
not be extended or customized by the application 
developer, they are orthogonal and redundant with 
the programming language,? and they are very er- 


1Time decoupling: the interacting parties do not need to 
be up at the same time. Space decoupling: the interacting 
parties do not need to know each other. 

In [Car98], the notion of pattern is used in a different 
sense, namely to express event-correlation: a notification is 
triggered upon arising of a combination of several elementary 
events. 

3A similar mismatch has been largely discussed in the do- 
main of object-oriented databases, where two separate lan- 
guages coexist; one for the definition of data and another one 
for the querying of data [BZ87]. 
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ror prone: syntax errors violating the subscription 
grammar are only seized at runtime when a pattern 
is parsed, just like syntaxically correct constraints 
based on badly written attributes. 


Filter library. Our approach which has been 
realized in the context of DISTRIBUTED ASYN- 
CHRONOUS COLLECTIONS (DACs) [EGS00a] — sim- 
ple JAVA programming abstractions which encom- 
pass different message-oriented interaction styles - 
avoids any subscription language, and respects en- 
capsulation. It promotes the expression of sub- 
scription patterns by combining general-purpose fil- 
ter objects. These filter objects preserve encapsu- 
lation by querying message objects through meth- 
ods which are dynamically defined by the applica- 
tion, along with the semantics of the evaluation of 
the invocation results. The subscription grammar 
is inherently expressed through the resulting API, 
which strongly reduces the number of runtime er- 
rors. Filters are thus pictured as first class citizens, 
and their implementation relies on structural reflec- 
tion [Coi87] of the message objects. 


Structural reflection. As pointed out in 
[Fer89], there are mainly two kinds of reflection. 
The computational (behavioral) reflection is con- 
cerned with the refication of computations and 
their behaviour. In contrast, structural reflection 
reifies the structural aspects of a program, such as 
data types. 

As we will show in this paper, structural reflection 
can be used to express subscription patterns in 
content-based publish/subscribe the same way 
it has already been used in object-oriented data 
management systems to express object queries 
(e.g., [S095]). In our particular context, structural 
reflection can be reduced to a single aspect: the 
capability of representing structures of objects. 
This is sufficient to dynamically define the methods 
that our filters must use to query message objects. 
The introspection capabilities of JAVA [Sun99a] 
offer sufficient support for this, and the possibility 
of modifying data structures is not required. 


Contributions. This paper presents how we have 
realized content-based publish/subscribe in our 
DAC framework for distributed computing, which 
is implemented in JAVA on UNIX. We illustrate how 
our approach (1) circumvents the need for any sub- 
scription language, (2) preserves object encapsula- 


tion, and (3) helps avoiding type errors. We discuss 
the flexibility/performance trade-off introduced by 
our use of reflection by outlining the optimizations 
we havcapplied, such as runtime generation of static 
code from dynamically defined filters. 


Roadmap. This paper is structured as follows: 
Section 2 overviews the limitations of existing ap- 
proaches to expressing content-based subscription 
patterns. Section 3 presents our approach to 
content-based publish/subscribe based on structural 
reflection. In Section 4 we illustrate the use of our 
subscription scheme through a small example. Sec- 
tion 5 discusses performance issues. Section 6 high- 
lights alternative approaches. Section 7 concludes 
with final remarks. 


2 Approaches to Content-Based 
Publish/Subscribe: Background 


Content-based publish/subscribe removes limita- 
tions of the static topic-based flavor, but suffers 
from the dynamism it introduces. Besides making 
the reuse of existing multicast primitives problem- 
atic [OAAt00], content-based publish/subscribe is 
hard to express in an object-oriented setting. In 
this section, we illustrate the latter difficulty by 
outlining the limitations of existing content-based 
schemes. 


2.1 Subscription Languages 


In content-based publish/subscribe, subscription 
languages are the most commonly used means of 
describing subscription patterns. Such languages 
can be based directly on the attributes of the de- 
scribed objects or on additional properties attached 
to those objects. By viewing asynchronous invoca- 
tions as events, the arguments of such invocations 
can be used as matching criteria. 


Attributes. In systems like SIENA [Car98], 
Exivin [SA97] or GRYPHON [BCM*99, SBS98},? 
event notifications are viewed as flat structures, 

4In Grvpuon, reflection is also used ([SBS98]), but not 


for the expression of subscription patterns: the GRYPHON 
system uses the same infermation dissemination mechanisms 
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i.e., records with several fields. A subscription lan- 
guage is used to impose ranges of values for those 
fields. Figure 1 outlines this concept schematically. 
Relying on attribute-value pairs enables very effi- 
cient realizations, since computational] overhead is 
reduced by directly accessing attributes. This ap- 
proach however bears several dangers: 


Violation of object encapsulation: In the example 
outlined in Figure 1, the from attribute is used 
as subscription criterion, and is consequently di- 
rectly accessed when the object is queried. 


Errors: Syntax errors violating the subscription 
grammar are only seized once a pattern is parsed, 
i.e., at runtime. Another more malign type of 
errors result from badly typed attribute names. 
Subscription patterns containing such errors do 
not violate the syntax grammar, and might re- 
main undetected without type checks. 


Learning phase: Subscription syntaxes are often 
very complex and used with a single pub- 
lish/subscribe middleware. This reduces porta- 
bility of applications. 


To increase portability of applications some en- 
gines implement standardized API’s like the OMG’s 
CORBA NOTIFICATION SERVICE [OMGO00], which 
repairs certain lacks [SV97] of the CORBA EVENT 
SERVICE [OMG98]. Among the new features in 
[OMGO0] are a content-based subscription scheme 
based on a simplified kind of typed events, replac- 
ing the typed events of the ancestor. These struc- 
tured events are roughly composed of two types of 
fields, namely (1) fixed fields and (2) variable fields 
consisting of name-value pairs, to which applica- 
tions map their specific needs. The fields of mes- 
sages are seen as their attributes and are directly 
accessed through filter objects for content-based fil- 
tering — violating encapsulation. Patterns are ex- 
pressed by strings following the DEFAULT FILTER 
CONSTRAINT LANGUAGE, a complex subscription 
language which extends the TRADER CONSTRAINT 
LANGUAGE. 


Properties. The SUN counterpart to the 
CORBA NOTIFICATION SERVICE is the JAVA 
MESSAGE SERVICE (JMS) API [HBS98]. The 
JMS covers topicbased  publish/subscribe 


it offers to applications (which is its primary concern) for 
internal protocol communication. 


Message m public class ChatMsg { 







public String from; 











“Message sent by Tom” 
String criteria = "from is Tom"; 


Criteria 
Argument 
Evaluation 





m.from.equals("Tom") 


Figure 1: Subscription Language 


(all-of-n) as well as message queuing (one-of- 
n) [BHL95, DEC94, Sys00, Mic97}. Content-based 
filters can be applied with both interaction schemes. 
The filtering is based on attributes of the message 
headers, and on properties (name-value pairs), 
which are explicitly attached to message objects. 
Subscription patterns are expressed as JAVA strings. 
The specification includes a subscription grammar 
that these strings must respect. 

Properties explicitly attached to message objects 
are artificial and in practice strongly redundant 
with the information carried by those objects. In 
many cases, the properties are faithful duplicates of 
the attributes of the message objects, which leads 
to violating encapsulation. 


Arguments. MICROSOFT’s COM+ [Obe00] pro- 
motes a model similar to the abandoned typed 
model of the CORBA EVENT SERVICE. Asyn- 
chronous invocations are viewed as events, but lat- 
ter ones are not reified. The primary filtering is 
thus made on the types of the subscribers, as illus- 
trated by Figure 2. By viewing an invocation as an 
event, the invocation arguments can be viewed as 
the attributes of the resulting notification. Filters 
in COM+ are expressed on invocation arguments 
through a limited subscription grammar. Encapsu- 
lation seems to be preserved by avoiding the reifica- 
tion of events. 


Subscriber s | public class Chatter { 


public void in(String from, ...); 
} 
Criteria “message sent by Tom” 
Argument String criteria = "from is Tom"; 
Evaluation from.equals("Tom") 





Figure 2: Events vs. Invocations 
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2.2 Template Objects 


The JAVASPACE specification [Sun99b] (inspired by 
LINDA’s TUPLE SPACE [Gel85]) adopts an approach 
based on templete objects. 

When subscribing to a JAVASPACE, a subscriber 
provides a template object t. A message object m 
is only delivered to that subscriber if m conforms to 
the type of t, and if every attribute of t which is not 
null references an object equal to the corresponding 
attribute of m (cf. Figure 3). Equality is tested by 
comparing byte-wise the two objects in marshalled 
form. As shown by [FHA99], this approach repre- 
sents a very convenient subscription scheme which 
can be put to work easily. However, encapsulation 
is violated, and there are certain limitations in ex- 
pressiveness: 


Limited comparisons: Attributes are compared for 
strict equality, and it is not straightforward to 
express a range (discrete or not) of possible values 
for an attribute. 


Limited granularity: In JAVA, an attribute can ref- 
erence an object, which itself has attributes, etc. 
Attributes of JAVASPACE entries are however 
matched as a whole. This limitation is also found 
with most of the previous approaches based on 
subscription languages. 


Limited combinations: By providing a template ob- 
ject t, a subscriber will receive every object m 
whose attributes all match the attributes of t. 
It is thus difficult to express alternatives (or) on 
different attributes. 


Limited values: Since null is chosen to play the role 
of wildcard, attributes can not be of native types, 
and null can not be easily used as a concrete 
value for an attribute. For each such attribute 
[Sun99c] proposes to add an additional boolean 
attribute to indicate a null value. 


3 Reflection-Based 
Publish/Subscribe 


In this section we present our approach to specifying 
content-based subscription patterns. It is based on 
structural reflection of message objects and avoids 
limitations stated in the previous section. 


Message m public class ChatMsg { 






public String from; 










Criteria 
Argument 


“message sent by ‘l’om” 
ChatMsg mt = new ChatMsg(); 
t.from = 







"Ton"; 





Evaluation m. from. equals(t. from) 


Figure 3: Template Object 


3.1 Overview 


Roughly spoken, the application programmer de- 
fines conditions on message objects, by specify- 
ing methods through which these objects should be 
queried, along with expected values that are com- 
pared to the values returned by invoking these meth- 
ods. 

Subscription patterns are expressed through an 
API, which inherently expresses a subscription 
grammar: by instantiating and combining filter ob- 
jects, syntax errors violating the grammar are de- 
tected by the JAVA compiler. Thanks to the struc 
tural reflection of message objects, type errors are 
avoided by verifying the methods specified by the 
application. 

In the following, we present our customizable fil- 
ter objects, called conditions. These enable the dy- 
namic definition of conditions on message objects, 
and are realized in a general manner through acces- 
sors, which we introduce first. 


3.2 Accessors 


Accessors are specific objects used to access partial 
information on the runtime message objects. 


Querying objects. Informally, an accessor A 
is characterized by a set of tuples: A = 
((M1, Pi), ..., (Mx, P.)), whereevery M;isa method 
and P, = (P;1,..., P;,;,) its corresponding argument 
list. Whenever a method M; is applied to an object, 
this subsumes that it is invoked with its arguments 
P;. 

An accessor can be seen as a function, which ap- 
plied to a message object returns another object: 
A(o: 0bj) — obj. When such an accessor A is eval- 
uated for a message object m, M, is invoked on m 
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and every method M;,,; (0 < i < k) is recursively 
invoked on the result of M;. Finally, the result of 
Md), is returned.® 

In JAVA, an accessor object implements the interface 
Accessor given in Figure 4, and is evaluated by call- 
ing the get() method with the message object as 
argument. This method can throw exceptions raised 
when evaluating the method chain, which enables 
the reaction to exceptions. Returning null in case 
of exceptions would contradict the use of null as 
matching criterion. 


public interface Accessor { 


public Object get(Object m) throws Exception; 
} 


Figure 4: Accessor Interface 


Using Java reflection. To implement our acces- 
sors, we rely on structural reflection. The inherent 
JAVA language reflection capabilities [Sun99a] con- 
sist in a type-safe API that supports introspection 
about classes and objects in the current JAVA VM 
at runtime. We view introspection as one aspect 
of structural reflection, limited to the reification (in 
the sense of representation) of structures of types 
and classes at runtime. A second aspect, the mod- 
ification of those structures is, like computational 
reflection, not addressed by the JAVA core reflection 
API. 

In short, JAVA provides meta-objects which reify 
classes, methods, fields, constructors, etc. We 
make extensive use of meta-objects for methods 
(java.lang.reflect.Method) to reify the M;’s 
of accessors. This defers to runtime the choice 
of which method is to be invoked, and enables 
also to effectively perform such a dynamic invoca- 
tion.” We avoid using objects reifying attributes 
(java.lang.reflect.Field), since dealing with 
them means abandoning encapsulation. 


5With k = 0, the object m itself is accessed as a whole. 
if = O means that M; is an argument-less method. We do 
not consider side-effects of the access methods M;. 

SJavA 1.3 integrates a limited mechanism for computa- 
tional reflection with the java.lang.reflect.Proxy class. 

7Note that with JAVA method objects, a native value is 
wrapped by an instance of its corresponding object type, 
which makes the nesting of invocations even simpler. 





public final class Invoke 
implements Accessor, java.io.Serializable 
{ 
/* only one method, can be null */ 
public Invoke(Method M, Object{) args) {...} 
/* with nested accessor */ 
public Invoke(Accessor nested, Method M, 
Object{] args) {...} 
/* structurally conformant objects, nesting */ 
public Invoke(String methodNames, 
Object [][] args) {...} 


public Object get(Object m) 
throws Exception {...} 


Figure 5: Invoke Class (Excerpt) 


Specifying methods. We have used JAVA reflec- 
tion for the implementation of the Invoke class 
shown in Figure 5, a general-purpose accessor. The 
first constructor enables the expression of a single 
method invocation. The other constructors shown 
in the figure enable the creation of an accessor re- 
flecting nested method calls; by specifying an ex- 
plicitly created nested accessor, or by specifying 
the names of the methods to be invoked. This ad- 
duces the two ways for an application to specify a 
method: 


By method object: The application explicitly deals 
with reflection, and provides a Method object. As 
explained in [BW98], JAVA enforces name equiva- 
lence of types, and a method object M is therefore 
bound to a single type T: if a method M for type T 
is applied to an object m which does not conform 
to T, null is returned — even if m implements a 
method of the same name and signature than M. 
By specifying methods as objects, the application 
implicitly defines the type of message objects it is 
interested in. 


By method name (and signature): Specifying the 
name of a method and its arguments (and implic- 
itly the method’s signature) can be interesting 
to enforce structure equivalence of types, i.e., 
subscribing to all objects which implement a 
given method, independently of their type. This 
implies, for each evaluated message object, a dy- 
namic lookup of the corresponding method object 
(through java. lang.Class) by the accessor. 
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In Section 5 we evaluate the two possibilities in 
terms of efficiency, and show that the knowledge 
of the type of message objects is important for per- 
formance optimizations. 


Avoiding type errors. Knowing the type of the 
fitting message objects is also useful for type check- 
ing. If all methods of an accessor are reified, the 
return type of each such M; can be checked for 
its conformance to the type bound to M;41. Simi- 
larly, the type of each provided argument P;,; can 
be checked for its conformance to the type of the 
j-th formal argument of M;. By enforcing these 
checks, the Invoke class rules out type errors.® To 
enforce such checks without explicit use of reflec- 
tion, message object types can also be specified by 
their name. This is illustrated in Section 4 through 
a small programming example. 


3.3 Conditions 


While a message object is queried through an acces- 
sor, a condition object evaluates the obtained infor- 
mation, i.e., decides whether it represents a desir- 
able value. 


Model. A condition C = (A, R, B) represents a 
single condition that a message object m must ful- 
fill in order to be delivered. B is a comparison 
function which can be viewed as a binary predicate: 
B(o, : 0bj, 02 : obj) -+ bool. The two arguments are 
(1) a predefined result R and (2) the result of the 
invocation chain represented by the accessor A. A 
condition is thus evaluated against a message object 
m, and evaluates positively iff m satisfies that con- 
dition: C(o : obj) -+ bool, and C(m) = B(R, A(m)). 
Figure 6 outlines the different evaluation stages of 
a condition. A similar scheme can be found for 
object queries in object-oriented data management 
systems, e.g., TIGUKAT [SO95].° 


8To verify whether a given reified type conforms to 
another one, we mainly rely on the isAssignableFrom() 
method in class java. lang.Class. 

®°The major difference between queries in an object 
database and the filtering of messages by a middleware is 
the duration of a query. With a middleware system based on 
content-based publish/subscribe, the query is expressed for 
future objects. In object databases, queries are performed on 
a snapshot of the system, but the expression of the query can 
be made similarly. [PO93] also describes the use of reflection 
for a closer integration of the language with TIGUKAT. 
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Figure 6: Applying a Condition to a Message Object 


Basic conditions. In JAVA, a condition object 
implements the Condition interface given in Fig- 
ure 7. It is evaluated for a given message object m by 
invoking conforms() with m as argument. The con- 
dition classes we propose are conceptually similar to 
the predicates found in JGL [Obj99] that are used in 
conjunction with centralized collections. The main 
difference is that our condition objects are specific 
to publish/subscribe, by representing queries on fu- 
ture objects. 

What differentiates our condition classes are the 
comparison functions they encapsulate. The other 
attributes, namely accessor and result, are initial- 
ization arguments and can thus be factored out. 


public interface Condition { 


public boolean conforms(Object m); 


} 


Figure 7: Condition Interface 


Comparisons. JAVA inherently defines three ba- 
sic comparison mechanisms, which can be consid- 
ered as candidates for B: 


I. Is object o1 identical to object 02? 

The comparison of two objects with the == op- 
erator yields true iff the two arguments are ref- 
erences to the same object. This comparison is 
less useful in our context, since two compared 
objects usually originate from different VMs. 
By default two such objects are never identi- 
cal. 


II. Is object 01 equal to object 02? 
Every object can also be compared to any 
other object by means of the equals() method, 
which is inherent to all JAVA objects and can 
be overwritten by application-defined classes. 
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III. How does object 01 compare to object 


02? 

This is for objects implementing the 
java.lang.Comparable  interface,!° _pro- 
viding a method compareTo(). The return 


value is an integer, indicating the order of 
the object of with respect to 02. Such ob- 
jects manifest a natural ordering, e.g., class 
java.lang. Integer, and can thus be matched 
against a range of values. Comparisons can be 
moved out of the compared objects by using 
java.util.Comparator objects, which are 
binary predicates. 


In general, B is represented in JAVA by a method, 
and can also be viewed as M;,4,. Inversely, meth- 
ods M;...M, (j > 1) can be seen as part of the 
comparison. In that sense, we provide several short- 
cuts for common methods, e.g., to compare the type 
of an object to a predefined one. This reflects the 
method isInstance() (the dynamic counterpart to 
the instanceof operator) in java.lang.Class. 

In our condition classes, like the Equals class given 
in Figure 8 (representing an equality test in the 
sense of II), we have added constructors which alle- 
viate their use. The third constructor in the figure 
for instance enables the expression of nested method 
calls by providing a URL-like string denoting the 
names of the methods. The accessor is in that case 
created implicitly. Figure 9 shows the links between 
our JAVA implementation of accessors and condi- 
tions, illustrated through the Equals and Invoke 
classes. 


public final class Equals 
implements Condition, java.io.Serializable 
{ 
/* compare the message object as a whole */ 
public Equals(Object to) {...} 
/* compare return value of accessor */ 
public Equals(Accessor acc, Object to) {...} 
/* implicit accessor creation */ 
public Equals(String names, Qbject[]{] args, 
Object to) {...} 


public boolean conforms(Object m) {...} 


} 


Figure 8: Equals Class (Excerpt) 


10The counterpart to the well-known Magnitude type in 
SMALLTALK [GR83]. 
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Figure 9: Class Diagram 
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3.4 Subscription Patterns 


A subscription pattern S represents a combination 
of basic conditions. 


Patterns and conditions. A subscription pat- 
tern S = ((C\,...,Cn),F) is characterized by a 
set of n basic conditions, which are all evalu- 
ated for a given message m, and a n-ary function 
F(b, : bool,...,bn : bool) -+ bool, which is evalu- 
ated for the results of these conditions: S(m) = 
F(C\(m),...;Cn(m)). A pattern is thus evaluated 
like a condition: S(o : obj) -+ bool, and is repre- 
sented in Java by an ob ject of type Condition. The 
model diverges here from the concrete realization, in 
that the function F does not appear as such. 


Expressing patterns. F is namely explicitly 
constructed by combining conditions. These com- 
binations are expressed through specific conditions, 
reflecting binary predicates, like And (Figure 10), 
Or, etc. Furthermore, we propose a condition Not 
for negation. To ease the expression of combina- 
tions, we introduce the SimpleCondition interface 
(Figure 11), an extension of Condition, which our 
basic conditions in fact implement.!! 


This subscription scheme based on conditions inher- 
ently expresses the subscription grammar. Syntax 


11This counteracts JAvA’s lack for operator overloading (as 
provided for instance by C++ [Str97]). 
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errors known from subscription languages, where 
they are only recognized at execution of the parser, 
are here detected by the JAVA compiler. 


public final class And 

implements Condition, java.io.Serializable 
af 

/* the two arguments +/ 

private Condition first; 

private Condition second; 


public And(Condition first, 
Condition second) 
{ this.first = first; this.second = second; } 
public boolean conforms(Object m) 
{ return first.conforms(m) && 
second.conforms(m); } 


Figure 10: And Class (Excerpt) 


public interface SimpleCondition 
extends Condition 


and(Condition with); 
or(Condition with); 
nand(Condition with); 
nor(Condition with); 
xor(Condition with); 
not(); 


public SimpleCondit ion 
public SimpleCondition 
public SimpleCondition 
public SimpleCondition 
public SimpleCondition 
public SimpleCondition 


Figure 11: SimpleCondition Interface 


4 Programming Example 


This section illustrates the use of content-based fil- 
ters through chat sessions based on simple DACs. 
We first recall our notion of DISTRIBUTED ASYN- 
CHRONOUS COLLECTION (DAC) and then build on 
the example initially introduced in [EGS00al. 


4.1 Background: Distributed Asyn- 


chronous Collections 


Just like a conventional collection, a DAC repre- 
sents a group of related objects. A DAC is however 
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distributed and can be accessed from various nodes 
of a network. In a way similar to a JAVASPACE, a 
DAC enables distributed participants to share infor- 
ination by pulling information from the space, but 
also by registering a callback object to be notified of 
future elements. DACs furthermore express through 
their type the qualities of service (QoS) they sup- 
port. In other terms, we offer a framework of DAC 
subtypes representing different semantics and QoS. 
In this example, we will use a DAC representing 
a topic, to which application components subscribe 
with an optional content-based filter. 


4.2 A Chat Scenario 


We concentrate on two chat addicts, Alice and Bob, 
who love to chat deep into the night. Therefore 
they subscribe to the topic /Chat/Insomnia to re- 
ceive all messages from like-minded chatters. Fig- 
ure 12 shows class ChatMsg, which represents a pos- 
sible message class for this application. 


public class ChatMsg 
implements java.io.Serializable 

of 
private String sender; 
private String text; 
public String getSender() { return sender; } 
public String getText() { return text; } 
public ChatMsg(String sender, String text) { 

this.sender = sender; this.text = text; } 


Figure 12: Event Class for Chat Example 


4.3. Publishing 


A DAC represents a topic, and in order to access 
such a DAC from a process, a proxy must be created. 
This requires an argument denoting the name of the 
topic represented by the DAC. Except for that topic 
name, the action of creating a DAC proxy is identi- 
cal to creating a local collection. The DAStrongSet 
collection class instantiated in Figure 13 offers re- 
liable delivery of notifications to subscribers. The 
instance called myChat henceforth provides access 
to the topic /Chat/Insomnia. Now it is possible to 
directly publish and receive messages for the topic 
associated to that DAC. 

Creating an event notification for a topic consists in 
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inserting a message object into the DAC by issuing 
a call to the add() method, from where it becomes 
accessible for any party. 


/* connect */ 
DASet myChat = new DAStrongSet ("/Chat/Insomnia"); 


/* create new message and publish it */ 
ChatMsg m = new ChatMsg("Alice", "Hi everyone"); 
nyChat.add(m); 





Figure 13: Publishing a Message 


4.4 Subscribing 


In order to subscribe to a DAC, a callback object im- 
plementing the Notifiable interface must be pro- 
vided. Figure 14 shows how to implement a simple 
callback object for chat sessions. 


public interface Notifiable { 


public void notify(Object m); 
} 


public class ChatNotifiable 
implements Notifiable 
{ 
public void notify(Object m) { 
/* elements are of type ChatMsg */ 
ChatMsg cm = (ChatMsg)n; 
System.out.println("Message from " + 
cm. getSender()); 
Systen.out.println(cm.getText()); 


Figure 14: Callback Object 


Episode I. Figure 15 shows a first example of 
content-based publish/subscribe. We suppose here 
that Bob is only interested in what a particular par- 
ticipant, Alice, publishes on topic /Chat/Insomnia. 
Bob defines a corresponding condition. The null 
argument in the condition initialization denotes a 
set of empty argument lists. A subscription is 
viewed as an interest in future elements, and is ex- 
pressed by a call to the contains() method. 


/* connect */ 
DASet myChat = new DAStrongSet("/Chat/Insomnia"); 


/* create condition */ 
Condition onlyAlice = 
new Equals("/getSender", null, "Alice"); 


/* create callback object and subscribe */ 
Notifiable n = new ChatNotifiable(); 
myChat.contains(n, onlyAlice); 


Figure 15: Content-Based Subscribing (I) 


Episode JJ. Suppose now that Bob is only more 
interested in what Alice says about him. For this 
second condition, the text carried by each chat mes- 
sage must be checked for the occurrence of Bob’s 
name. Remember that in JAVA a string si can be 
checked for the occurrence of a substring s2 by ask- 
ing si through acall to indexOf () for the index of 
its first occurrence of s2. If s2 is not contained in 
si, the call returns -1. The resulting second con- 
dition in the figure represents all messages which 
do not contain Bob’s name, and must therefore be 
negated. 

This example shown in Figure 16 illustrates 
how to easily combine basic conditions with the 
SimpleCondition interface, and how the applica- 
tion can specify the type of the message objects with 
implicit accessor creation, as required for the perfor- 
mance optimizations we propose in the next section. 


5 Performance 


Reflective systems and meta-level architectures offer 
increased modularity and flexibility. The benefit of 
such dynamism is often, but not necessarily, dimin- 
ished by performance degradation. In this section 
we first give a rough idea of the cost of dynamic 
code introduced by JAVA reflection. Motivated by 
these results we then propose two optimizations to 
our system, and we discuss their performances. 


5.1 


Preliminary: Cost of Reflection 


According to the way we have described our imple- 
mentation, methods are invoked dynamically, i.e., 
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/* connect */ 


DASet myChat = new DAStrongSet("/Chat/Insomnia"); 


/* create first condition, with type specification */ 
SimpleCondition onlyAlice = 
new Equals("ChatMsg:/getSender", null, "Alice"); 


/* create args list and corresp. second condition */ 
Object [][] args = {{null}, {"Bob"}}; 
SimpleCondition notAboutMe = 
new Equals("ChatMsg:/getText/indexOf", args, 
new Integer(-1)); 


/* combine conditions */ 
SimpleCondition pattern = 
onlyAlice, and(notAboutMe.not()); 


/* create callback object and subscribe */ 
Notifiable n = new ChatNotifiable(); 
myChat.contains(n, pattern); 


Figure 16: Content-Based Subscribing (II) 


through reified methods. Such dynamic invocations 
are much more expensive than static ones. More- 
over, when subscribing to structurally conformant 
objects (cf. Section 3), method objects are obtained 
at runtime for each message object. Such lookups 
are very costly, and are summed with the overhead 
of the dynamic invocations. 

Figure 17 shows the cost of dynamic calls by com- 
paring the overhead of local method invocations 
with a varying number of arguments (between 0 
and 10 objects). These are performed using (1) dy- 
namic invocations, each combined with a method 
lookup, (2) dynamic invocations without lookups, 
and (3) static invocations. These tests were made 
on a SUN ULTRA 60 (SoLaRIS 2.6, 256 Mb RAM, 9 
Gb harddisk) with JAVA 1.2 (native threads). The 
test setting did not involveany JUST IN TIME (JIT) 
compiler. The speedup factor observed for static in- 
vocations when using a JIT compiler was over three. 
The speedup in the case of dynamic evaluation is, 
as expected, insignificant. 


5.2 Optimizations 


The amount of expensive dynamic code can be re- 
duced if the type of the message ob jects is known. 
The type information can be given to the system 
either (1) by using reflection explicitly, or (2) by 
specifying the type of the message ob jects by name. 
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Figure 17: Latency with Different Invocation Styles 


The type information enables the application of op- 
timizations. ! 


Avoiding redundant invocations. Message ob- 
jects are usually matched against patterns of several 
subscribers at a time, and these patterns are likely 
to present redundancies. We discuss here an opti- 
mization based on that observation, which is simi- 
lar, but not identical to the tree matching algorithm 
used in GRYPHON [ASS*98]. The tree matching 
algorithm factors out redundant subpatterns with 
simplified assumptions: only ands of basic condi- 
tions are considered, and latter ones are primitive 
comparisons of attribute values with predefined val- 
ues. 

In contrast, our filter library offers more expres- 
siveness, e.g., nested method invocations, different 
comparators and combinations (and, or, ...). Such 
combinations are performed statically, and dynamic 
queries on message ob jects represent the critical fac- 
tor in our system. As a consequence, we focus on 
detecting common denominators of accessors, in or- 
der to avoid the evaluation of redundant dynamic 
method invocation chains. Figure 18 shows a sim- 
ple example of redundant accessors where each sub- 
scriber specifies a pattern consisting of a single basic 
condition. An invocation tree, like the one shown in 
Figure 19, is constructed from all accessors and is 
evaluated for every filtered message ob ject. 


12In a companion paper [EGS0Ob] we introduce type-based 
publish/subscribe: a static classification scheme based on the 
types of message objects. The type-based publish/subscribe 
scheme ensures type safety, and thus enforces optimizations 
through the inherent knowledge of types and makes type 
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Subscriber Si: Ai, = ((M1; P1)) 

Subscriber S2: A2, = ((M2, P2), (Ma; Ps)) 
Subscriber Ss: A3, = ((M2, P2), (M3, Ps), (Ma, Pa)) 
Subscriber Sa: Aa, = ((Ma2, P2), (Ma, Ps)) 


Figure 18: Redundancy between Accessors 





Figure 19: Invocation Tree 


Enforcing static filters. Based on the observa- 
tion that dynamic invocations are far more costly 
than static ones, we have implemented an alter- 
native optimization. Any dynamic invocations are 
avoided by generating static source code from acces- 
sors after performing type checks. The source code 
is then directly compiled by calling the SUN JAVA 
compiler (sun. tools. javac), in a way similar than 
this is done in [KMS98] or [TCKI00].!4 


5.3. Evaluation 


We evaluate here the benefits of the two above 
optimizations by comparing the resulting perfor- 
mances with two non-optimized scenarios. These 
are namely (1) the filtering of structurally confor- 
mant messages, and (2) the filtering of type confor- 
mant messages. 


Testbed. Our measurements were made with the 
JAVA VM 1.2, enabled JIT and native threads on 
SUN SOLARIS 2.6. A single producer was publish- 
ing message objects encapsulating a single string 
from one network (SUN ULTRA 60, 256 Mb RAM, 


checks and casts inside the application code superfluous. 
13(KMS98] terms this technique (runtime) linguistic reflec- 
tion, which is seen as a synonym of structural reflection. 


9 Gb harddisk), to subscribers equally distributed 
over two further networks; one composed of all to- 
gether 60 SUN SUPERSPARC 20 stations (model 502: 
2 CPU, 64 Mb RAM, 1Gb harddisk), and the sec- 
ond one composed of 60 SUN ULTRA 10 (256 Mb 
RAM, 9 Gb harddisk) stations. The individual sta- 
tions and the different networks where communicat- 
ing via FAST ETHERNET. 


Parameters. We have made a set of extensive 
tests, in which we have always varied one of four 
parameters for the subscriptions. These are namely, 
(1) the fraction of positive matches for a subscriber 
1/c, (2) the total number of subscribers s, (3) the 
maximum nesting level of invocations for queries a, 
and (4) the number of different query methods d at 
each nesting level. 


Varying 1/c: From n produced messages, an aver- 
age of n/c messages matched a given subscribers 
pattern. Figure 20(a) shows the effect of varying 
c. It confirms the intuition that the cost of send- 
ing messages with UDP does not depend on the 
matching scheme, and can be seen as fixed. With 
c > 100 in this scenario, the pure cost of match- 
ing is measured. In order to accentuate the dif- 
ferences between the matching schemes without 
contradicting our concrete applications, we have 
chosen c = 10 for the next figures. 


Varying s: Similarly to the scenario in Figure 18, we 
have chosen one basic condition per subscriber. 
Figure 20(b) reports the effect of scaling up s, 
conveying that the two optimizations are almost 
equivalent with a large s.14 As shown in the pre- 
vious figure, UDP is a limiting factor with an in- 
creasing number of sends (here due to a large s). 
Performance drops faster with static filters, since 
ever y additional subscriber involves a full pattern 
evaluation. In contrast, the optimized dynamic 
scheme is less sensitive since redundant queries 
are avoided. 


Varying a: The probability of having i € {[0,a] 
nested invocations was chosen as pa = 1/(a+ 1). 
Increasing a reduces throughput with static invo- 
cations (Figure 21(a)), since static accessors com- 
prise more invocations. Similarly, the optimized 
44Qur system relies on a hierarchical topology of message 

brokers, among which membership information is split up. A 


single process rarely has knowledge of more than 100 partic- 
ipants. 
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Figure 20: Matching Rate 


dynamic scheme is less efficient with an increasing 
a, since the total number of performed methods 
increases with the depth of the tree. 


Varying d: One of d methods was chosen at each 
nesting level with a probability of py = 1/@. Vary- 
ing d obviously does not influence static filter eval- 
uation. On the other hand, increasing d might 
lead to increasing the potential number of edges 
leaving from any node in the invocation tree. The 
resulting performance loss is directly visible in 
Figure 21(b). The optimized dynamic scheme is 
however more penalized by increasing a, as shown 
in the previous figure. This is due to the fact that 
increasing a by 1 might result in up to d new edges 
in every former leaf of the invocation tree. 


Interestingly the optimized dynamic matching 
scheme never overperforms the static scheme, even 
if the speedups become close with a large number of 
message sends. One could believe that with a strong 
redundancy between patterns and a large number of 
subscribers the dynamic scheme would become more 
efficient. Even with extreme parameter values, we 
have however never encountered such a scenario. 


6 Discussion 


In this section we debate alternative models and re- 
alizations we have considered as potential solutions 
for an adequate content-based subscription scheme 


Static 


Opt. Dynamic ~-~--- 
amigo 
ookup — 


Dynamic w. 


Throughput Imsg/ms) 





# Subscribers 


and Number of Subscriptions 


in the context of DISTRIBUTED ASYNCHRONOUS 
COLLECTIONS. 


Application-defined filters. We promote the 
expression of subscription patterns as a combina- 
tion of instances of our predefined condition classes. 
An alternative to this consists in allowing the appli- 
cation to provide directly its own static filter objects 
(byte code). Patterns expressed this way are how- 
ever opaque and not necessarily correct nor safe, and 
make optimizations difficult. 

Nevertheless, we have opted for an open de- 
sign, i.e., separation of interfaces and classes (e.g., 
Accessor/Invoke) vs conditions and accessors as 
final classes. This enables the extension of our 
subscription API with application-defined accessors 
and conditions. Our proposed optimizations can 
still be enforced by following certain design guide 
lines. 


Towards a unified language. An alternative to 
our subscription API consists in using the JAVA lan- 
guage itself as the subscription language. That is, 
providing code in a stringified form (source code), 
that can be parsed and compiled at runtime. A 
pseudo-variable m would represent the runtime mes- 
sage object, and method invocations could be di- 
rectly expressed, e.g.: 


"m. getSender().equals("Alice") && ...;" 


The evaluation of the code given here as a string 
is deferred. This comes to introducing two levels 
of programming, in a way similar to [NN88]. The 
generalization of that approach leads to multi-stage 
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Figure 21: Expressiveness and Redundancy of Subscriptions 


programming, e.g., METAML [TS97]. METAML is 
a meta-programming language that was designed 
as a homogenous runtime code generator toolkit. 
In METAML, the evaluation of expressions in <> 
(called brackets) is deferred to the next stage, in a 
sense similar to our stringified code delimited by '" 
above. Expressions evaluated at a later stage can re- 
fer to constructs at a previous stage. When stringi- 
fying meta-codein JAVA as above, this is not possi- 
ble, since JAVA reflection does not allow to dynam- 
ically obtain a reference to a variable by its name. 
This limitation can be circumvented as long as in- 
vocation arguments can be constructed inside the 
pattern string (e.g., "Alice"), but poses problems 
for complex matches. Extending the JAVA language 
in the sense of METAML would have contradicted 
our resolution of using merely standard language 
constructs. 


Java reflection. JAvassisT [Chi00] and OPEN- 
Java [TCKIO0] are two approaches to extending 
JAVA with load-time structural reflection, i.e., the 
ability of modifying classes at runtime prior to in- 
stantiation. OPENJAVA promotes a compile-time 
meta-object protocol [Chi95] based on an exten- 
sion of java.lang.Class, and makes use of the 
SuN JAVA compiler, while JAVASSIST provides an 
extended Classloader supporting the creation of 
new methods as copies of existing ones. 

We have however refrained from using JAVASSIST or 
OPENJAVA, because our static filters represent very 
specific classes which can be generated without any 
language extension. 





7 Concluding Remarks 


We argue through our work that, unlike what is of- 
ten claimed (e.g., [Koe99]), message-oriented mid- 
dleware and object-oriented principles are not con- 
tradictory. In [EGS00a], we have made a first 
step, by introducing a programming abstraction 
called DISTRIBUTED ASYNCHRONOUS COLLECTION 
(DAC) which is versatile enough to express com- 
monalities between the different message-oriented 
interactions styles. In that paper we have focused 
on topic-based publish/subscribe. 

In this paper, we have attacked another bastion, 
content-based publish/subscribe, which is presumed 
to contradict object-oriented principles by its very 
nature. We have illustrated that it is indeed pos- 
sible to express content-based subscription patterns 
in a way that fully preserves encapsulation. More- 
over, we have shown that our approach offers further 
practical benefits over contemporary approaches, 
like the possibility to prevent syntax errors and type 
errors. 

In terms of performance, the cost of our solution 
is incurred by the latency resulting from the use of 
JAVA reflection. In our case, this use is however re- 
duced to verifications of subscription patterns aim- 
ing at avoiding type errors. After this initializa- 
tion phase, dynamic invocations are circumvented 
by using static code generated at runtime without 
any modification to the JAVA compiler or VM. The 
throughput of our system is thus not conditioned by 
the use of reflection, as proven by the resulting per- 
formances. We are furthermore currently working 
on a new optimization scheme combining the ben- 
efits of our static and dynamic optimizations. The 
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idea is to generate static code from dynamic invoca- 
tion trees, to further improve performance but also 
to reduce the overall compilation effort. 

The cost of our solution in terms of feasibility is 
limited to the need for structural reflection; yet with 
such minimal features that the inherent JAVA reflec- 
tion capabilities can satisfy this need. 

We do not claim that our content-based subscription 
scheme is the ultimate solution to content-based 
publish/subscribe, nor that it replaces existing spec- 
ifications. It should rather be seen as a pragmatic 
attempt to circumventing shortcomings of other ap- 
proaches. Our filter library is not limited to the 
context of DACs, but could be put to work easily in 
other existing event-based systems. 
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Abstract 


The C++ Standard Template Library provides effi- 
cient storage of data in containers, and efficient op- 
erations on such containers. While STL can be pa- 
rameterized with custom allocators, these cannot be 
used to add persistency to the container classes pro- 
vided by STL. Thus, we have designed the Persis- 
tent Standard Template Library (PSTL) that over- 
comes this by providing its own containers that are 
compatible with STL, but store their elements on 
disk. This compatibility provides a programming 
model that is known and more natural to C++ pro- 
grammers and enables the reuse of many of the algo- 
rithms provided by STL in combination with PSTL. 
In this paper we discuss PSTL’s design, show the 
challenges we faced, and how STL’s design would 
have to be extended to provide native support for 
persistency. 


1 Introduction 


Persistent container libraries such as [GND99, 
Sle00] have two key advantages. Compared to 
volatile containers, the size of their database can 
grow beyond the size of the available memory and 
compared to transactional databases, they exhibit 
less overhead since they do not provide any func- 
tionality to rollback transactions. Persistent con- 
tainer libraries are the key element of many pro- 


*This work was supported in part by a grant from the 
USENIX Advanced Computing Association. 

tPart of this work was implemented during an internship 
at Hewlett Packard Labs, Palo Alto, CA 94304. 


grams such as sendmail [CA97], NNTPCache [AB], 
or NewsCache [GH99]. 


Today, a large number of persistent storage libraries 
exists, ranging from simple text-files to complex 
databases. The most prominent among these are 
the various types of dbm databases used for instance 
by sendmail. The creators of the Java program- 
ming language have even added a type to Java to 
indicate whether a given object may be stored on 
external storage [AG97]. Any object implementing 
the Serializable interface can be serialized and 
deserialized transparently. 


We have not found any persistent container for 
C++ that is compatible with STL and fulfills the 
requirements to be used within a server applica- 
tion. The importance of persistent libraries has al 
ready been identified by Bjarne Stroustrup. Accord- 
ing to [Ste95], his assumption was that persistency 
could be provided using custom allocators. This 
assumption, however, was based on an early ver- 
sion of STL that had allocators that were real ob- 
jects [Str94, Str00]. In later versions of STL the al- 
locator mechanism was simplified due to logical and 
performance problems with that kind of generality. 
However, it is still unclear whether these problems 
would have been unsurmountable [Str00]. Problems 
with the current allocator mechanism have also been 
identified by [Ste98b] and will be explained in detail 
in Section 3. 


We think that compatibility with STL is one of the 
key requirements since it allows for a more natural 
object-oriented programming style and the reuse of 
many of the algorithms provided by STL. These al- 
gorithms range from iterating over the container’s 
elements to sorting the container and exchanging 
elements between different containers. Compatibil- 
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ity is also one of the key challenges since some key 
properties of persistent containers such as serializa- 
tion are not necessary for their volatile counterparts. 


The paper is structured as follows. Section 2 gives a 
brief description of the requirements for a persistent 
container library. Section 3 explains why the allo- 
cator mechanism provided by STL is not sufficient 
to fulfill these requirements. The design and imple- 
mentation of PSTL is presented in Section 4 along 
with the challenges we faced in preserving compat- 
ibility with STL. Section 5 presents an evaluation 
and related work is considered in Section 6. We 
outline our ideas for improving PSTL in Section 7 
and draw our conclusions in Section 8. 


2 Requirements 


The requirements for persistent containers are sim- 
ple. Data has to be stored on persistent storage, 
each container should use its own file, and the con- 
tainer should be able to store more elements then 
fit into the computer’s memory (main memory plus 
swap space). Additionally, the containers should be 
simple to use and compatible with existing stan- 
dards. 


Since data is stored on disk, the objects stored in the 
container need to be serialized. Depending on the 
object, a memcpy() operation might be sufficient. 
In the following, we refer to these objects as simple 
objects and to those that require special serialization 
functionality (i.e., objects using pointers) as fragile 
objects. 


Since persistent containers are frequently used by 
server daemons to store their databases (e.g., send- 
mail, NewsCache, ...), the database must be acces- 
sible by several different processes. This is because 
daemons handle several clients simultaneously and 
typically spawn a new process for each client con- 
nection. Depending on the daemon, the database 
is created once and accessed read-only or is manip- 
ulated during the program’s operation. If the con- 
tainer is manipulated, support for locking is needed 
and changes need to be visible to other processes 
immediately. 


In case of the C programming language, these re- 
quirements are fulfilled by [GND99], for instance. 
For C++, however, we could not find a persistent 
container class library fulfilling these requirements 
and seamlessly integrating with STL. 


PSTL was primarily developed to replace News- 
Cache’s [GH99] newsgroup and article database. 
Thus, the containers have to be efficient enough 
for handling the typical Usenet traffic. The typ- 
ical spool size of a Usenet News server is about 
60GB and the traffic is at least 3-5GB of article data 
per day [CBB97] excluding the traffic generated by 
news reading clients. To ease the maintenance of 
the spool area, NewsCache stores each newsgroup 
in a separate file/container. 


3 STL and Persistence 


The containers provided by STL are stored in main 
memory which is volatile, but can be parameterized 
with different memory allocators. Thus, our initial 
approach was to provide a custom allocator that 
allocates its elements on persistent storage. 


While the constructors of the container classes do 
not directly provide an argument to pass the name 
of the file the container should be stored in, they 
provide constructors taking a custom allocator class 
as argument that could encapsulate the container’s 
filename. 


The custom allocator class, however, only provides 
methods that allow to allocate and free memory 
and to construct and destroy a given object (Str97, 
ISO98]. There is no mechanism to query the al- 
locator for previously stored elements. Thus, the 
constructors of STL’s container classes do not check 
for pre-existing elements. The only way to fix this 
problem is to subclass the original STL class and re- 
place the constructors and destructor as identified 
in [Ste98b, Ste98a]. 


It is insufficient, however, to simply replace the con- 
structors and destructor of the original STL con- 
tainer except in the simple case where the construc- 
tor reads all elements from the file and the destruc- 
tor writes all elements back to the file [Ste98b]. This 
is due to the fact that typical STL implementations 
make certain but legitimate optimizations based on 
the internal representation of the container. For 
instance, GNU C++’s STL implementation uses a 
pointer as the vector’s iterator. While this is fine for 
a non-persistent container, it cannot be used for a 
persistent container as we will show in the following 
sections. 


Another drawback is that a standard allocator is 
assumed to hold no per-object data [Str97, Chap- 
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ter 19.4.3] allowing the library to implement some 
container-manipulation functions by relinking ele- 
ments. For instance, splice() may be implemented 
by moving an element from one list to another with- 
out copying the element. This is not possible if 
different lists can be stored in different files. In 
this case the elements have to be copied from one 
container/file to the other. Thus, the allocator is 
bound to the type the container is parameterized 
with rather than to the container. Hence, this would 
restrict a program to use the same file for all con- 
tainers parameterized with the same value type. 


Due to these reasons it is not possible to convert 
an STL container into a persistent STL container 
just by supplying a different allocator class to its 
constructor. Adding persistence requires a tighter 
coupling between the container and its correspond- 
ing allocator. Thus, we were forced to reimplement 
the containers provided by STL. The containers pro- 
vided by PSTL are compatible with their corre- 
sponding STL containers, but use an extended in- 
terface provided by our persistent allocators. 


4 Design and Implementation 


Since PSTL tries to adhere to the STL specification, 
most of PSTL’s design is equivalent to STL’s design. 
The differences between PSTL and the typical STL 
implementation are outlined in the following sec- 
tions. For instance, we had to add a serialization 
mechanism that provides transparent serialization 
of the container’s elements. Additionally, we added 
an argument to the container’s constructors to in- 
dicate the file where the elements should be stored 
in. This allows the user to instantiate a container 
without having to explicitly instantiate the corre- 
sponding persistent allocator. 


4.1 Serialization 


As mentioned in Section 2 special care needs to be 
taken when fragile objects need to be stored in the 
persistent container. C/C++ pointers cannot be 
stored because the data stored at the address the 
pointer is pointing to will likely be located at a 
different address at the program’s next invocation. 
This problem can be solved by using a pointer swiz- 
zling technique as presented in [SKW92] or by se- 
rializing the object before storing it on disk. Since 


the pointer swizzling technique has a major draw- 
back with regards to concurrent accesses, as we will 
point out in Section 6.5, we have chosen to store the 
objects in a serialized form. 


STL itself does not provide any functionality for the 
serialization of its elements because the container’s 
elements are stored in memory and do not have to 
be made persistent. Thus, we had to extend the 
interfaces defined by STL with a means for serializ- 
ing and deserializing fragile objects. Typically, this 
functionality is implemented using one of the follow- 
ing approaches. 


e Instrumentation of the data structures [Kni99]. 


e Requiring the user to provide the code for se- 
rializing and deserializing the data structures 
stored within the container [GND99, Sle00]. 


The first approach allows the container to iden- 
tify the type of the data stored at each position. 
This makes it possible to use a generic algorithm 
for garbage collection and defragmentation of the 
database. The price for this, however, is an in- 
creased size of the container and that only instru- 
mented data structures may be stored. Another 
disadvantage is that the layout of the data struc- 
ture is pre-determined and cannot be changed at 
run-time thus making the exploitation of polymor- 
phism difficult. Since we want our implementation 
to be as compatible with the STL as possible, and 
no garbage collection mechanisms are provided by 
STL anyway, we have chosen the second approach. 


A straightforward implementation would be to re- 
quire the classes of the data structures to be stored 
to provide a member function that implements the 
serialization and deserialization functionality. This 
implementation, however, would have two short- 
comings. It cannot be easily added to any exist- 
ing class and it does not work in combination with 
builtin types. 


Thus, we have decided to use traits [Mye95, Str97] 
for PSTL. ‘Traits allow us to extend a template ar- 
gument without requiring it to provide the exten- 
sion. The extension is moved to a trait class which 
is supplied as an additional template argument to 
the container class. An advantage of this implemen- 
tation is the possibility to use different serialization 
algorithms for different container instances param- 
eterized with the same value type. 
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template <class T, class Tr=serialize_trait<T>> 
class pvector { 
// typedefs, 
reference 
front() { 
// See Section 4.5 for a description 
// of getdata 
offset_type head=alloc.getdata() ; 
return Tr: :deserialize (head) ; 
} 
} 


Figure 1: Vector Using Serialization Trait 


A simplified version of the pvector container us- 
ing the serialization trait class is shown in Figure 1. 
Whenever the container needs to serialize or deseri- 
alize an object it calls the corresponding functions 
of the trait class. The container and the serializa- 
tion classes have both a reference (alloc) to the 
persistent allocator class and can use its functions 
for allocating memory on the persistent file and con- 
verting offsets to pointers and vice versa. 


Our persistent container classes provide trait classes 
for storing builtin types, objects that do not have 
any pointers, and strings. The traits for the serial- 
ization of other classes must be provided by the user 
of the persistent template library. The implementa- 
tion of a custom scrialization trait is fairly simple. 
The interface that has to be provided is shown in 
Figure 2. 


4.2 Low-Level Disk Access 


So far, we have not discussed how and when the 
data is stored onto disk. Typical solutions to this 
problem are: 


e Read the data from disk in the constructor of 
the container and write it back in its destruc- 
tor [Ste98b]. Unfortunately, this approach vio- 
lates two of our requirements. It does not al- 
low the container to grow beyond the memory 
size and modifications of the container won’t be 
visible to other processes until the container is 
freed or flushed explicitly. 


e Whenever an clement needs to be accessed, seek 
to the element’s position and read it from or 
write it to disk [GND99, Nel98]. This requires 
the elements to be read from disk and to be 
copied from/to the I/O buffers every time an 


template <class T> struct serialize_trait { 

// typedefs, 

pstl_serialize_traits(allocator_type ka) 
: alloc(a) {} 

void 

serialize(const T &t, offset_type o) { 
// serialize t to alloc.off2ptr(o) 

} 

reference 

deserialize(offset_type o) { 
// deserialize element and return a 
// reference to it 

} 

const_reference 

deserialize(offset_type o) const { 
// deserialize element and return a 
// constant reference to it 


} 

size_type 

size() { 
// return T’s size; in case of a complex 
// object use an offset here and allocate 
// extra memory via the persistent 
// allocator 

5 

} 


Figure 2: Sample Serialization Trait 


element is accessed. Since all modern operat- 
ing systems provide elaborate caching strate 
gies, the disk access is negligible. The real prob- 
lem is that every element needs to be copied at 
each access no matter whether it actually would 
have to be serialized or not. 


PSTL, however, uses memory mapped files, an ap- 
proach different from both of the approaches pre- 
sented above. Memory mapped files have the ad- 
vantage that the operating system maps the file as- 
sociated with the container into the program’s ad- 
dress space and the program can read and write the 
file like memory allocated using malloc(). 


As shown in Figure 3, PSTL maps the container’s 
file storing the serialized representation of the con- 
tainer into its address space. Whenever an ob ject is 
accessed, it is deserialized and thus copied. In the 
Figure below, this is the case with the container’s 
first element, the author’s address. Now, the pro- 
gram can interact with the object representing the 
element like with any other object. Finally, when 
the ob ject is destroyed, it is serialized back to the 
memory mapped file. 
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Figure 3: Serialization and Deserialization of Ele- 
ments 


Using memory mapped files has several advantages: 


e The program can operate directly on the mem- 
ory mapped file. No copying from and to I/O 
buffers is necessary. 


e When virtual memory gets tight, the contents 
of the container’s associated file are swapped 
back to the file instead of to the swap space. 


e User programs may use pointers pointing into 
the memory mapped file and may act on them 
like on all other pointers. 


e Simple objects do not have to be deserialized 
and copied from the container. Since their de- 
serialized representation is the same as their 
serialized one, it is sufficient to return a ref- 
erence/pointer to the object’s address in the 
memory-mapped area. 


While this approach sounds simple, much care needs 
to be taken with its implementation. The size of the 
area to be mapped into memory needs to be known 
a priori. If more space is required than originally re- 
quested, the file needs to be resized and the memory 
area needs to be remapped. In this case all pointer 
references to the memory mapped region might have 
to be calculated anew since it cannot be guaranteed 
that the resized container can be remapped to the 
same address as before [Bac86, LMKQ90]. 


Theoretically, it would be possible to map different 
parts of the file to different memory regions and 
thus getting around this problem. This, however, 
would require the container to maintain a mapping 


for each region and address calculation would get 
complicated and most probably inefficient. 


Unfortunately, no memory management is available 
for the memory provided by the memory mapped 
file. Thus, we had to implement our own persis- 
tent allocator class that manages the free blocks of 
the memory mapped file. The interface provided by 
the persistent allocators is explained in Section 4.5. 
Similar to STL, this allocator class is used by all 
persistent containers. 


4.3 References 


The most challenging part of PSTL with regard 
to STL-compatibility was the references returned 
by some of the container’s member-functions (e.g., 
front()). For this problem, however, it is impor- 
tant to distinguish between fragile objects (objects 
using pointers) and simple objects (objects without 
pointers). 


If the container only stores simple objects, it is suf- 
ficient to return a reference to the memory address 
the object is stored at. This is because the object’s 
serialized and deserialized representation is the same 
and the object’s size is well known and does not 
change. Hence, there is no danger of accidentally 
overwriting the container’s internal data. 


If the container stores fragile objects, however, the 
object needs to be deserialized and thus copied into 
a temporary buffer. Otherwise, the user would not 
be able to use it. However, if the deserialized copy 
changes, we want the serialized version of the ele- 
ment to be changed as well. 


The straightforward approach would be the provi- 
sion of a wrapper class overriding all of the original 
methods and serializing the element back to disk 
when necessary. This is ugly, however, and is not 
guaranteed to work since some of the class’s non- 
virtual functions might have been expanded inline. 
Fortunately, this problem can be solved cleanly us- 
ing templates. 


PSTL returns a reference object that encapsulates 
the position of the data structure in the memory 
mapped area, provides a deserialized copy of the 
data structure and in case of a non-constant refer- 
ence writes the element back to disk when the ref- 
erence object is destructed. This is achieved by a 
generic wrapper similar to the one shown in Fig- 
ure 4. The wrapper only needs to be parameterized 
with the value-type’s serialization trait. 
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template <class Tr> 
class __pstl_ref: public Tr::value_type { 
// typedefs, ... 
serializer_type *s; 
offset_type 0; 
public: 
__pstl_ref(const value_type &x, 
Tr *ser, offset_type off) 
: value_type(x), s(ser), o(off) {} 
~__pstl_ref() { 
s->destruct(o); 
s->serialize(*(value_type*)this,o) ; 
} 
3; 


Figure 4: PSTL’s Reference Class 


While PSTL’s reference objects ensure that changes 
will be written back to disk after the element’s mod- 
ification, the element will be written back to disk 
only after the reference’s destruction. To prevent 
multiple wrappers of the same element from inter- 
fering with each other a cache object should keep 
track of the instantiated wrapper classes and en- 
sure that no more than one wrapper is instantiated 
for a given element within one process. The lock- 
ing mechanism ensures that different processes are 
only allowed to have constant references at the same 
time. 


4.4 Locking 


Since we want several processes to be able to ac- 
cess PSTL’s container’s simultaneously, we added a 
locking mechanism that ensures mutual exclusion. 
The current implementation of PSTL simply adds 
two new functions to each container: lock() and 
unlock(). lock() locks the container for shared 
or exclusive access and unlock() unlocks the con- 
tainer. PSTL also maintains an internal lock stack 
to ensure that different functions locking the same 
container do not interfere with each other. For in- 
stance, a function might request to lock a container 
that has already been locked by its process (i.e., by 
its caller). In this case lock() does not have to 
lock the container since it is already locked. When 
the function unlocks the container, however, the 
container must not be unlocked since the container 
should be still locked by the caller. 


Based on the design of STL, however, it should be 
possible to add a transparent locking facility. When- 
ever an iterator is requested, the container could be 


locked and with the iterator’s destruction it could be 
unlocked. Depending on a constant or non-constant 
iterator, the container would be locked shared or 
exclusively. The same would apply to references re- 
turned by the container. Unfortunately, depending 
on the container’s value type PSTL uses a wrap- 
per class or a builtin C++ reference. While the 
wrapper class would support a transparent locking 
mechanism, the latter does not since C++’s builtin 
references do not provide a destructor that could be 
used for unlocking the container. A simple solution 
of course would be to always return a wrapper ob- 
ject. This issue will be attacked in future versions 
of PSTL. 


4.5 Implementing More Persistent Con- 
tainers 


In case the container types provided by PSTL are 
not sufficient, it is easy to implement new ones that 
better fit the requirements of special purpose appli- 
cations. The implementation of a persistent con- 
tainer is similar to the implementation of a volatile 
container, except that offsets have to be used in- 
stead of pointers and instead of allocating memory 
using new, it has to be requested from the persistent 
allocator class. The persistent allocator class pro- 
vides the following functions for the management of 
the container’s persistent memory: 


off2ptr()/ptr2o0ff() convert offsets to pointers 
and vice versa. While offsets must be stored in 
the container, they need to be frequently con- 
verted to pointers to operate on the memory- 
mapped storage area. 


nvalloc()/nvfree() allocates/frees a memory 
area within the file. Whenever one of these 
functions is called, the container’s storage area 
might have to be resized and might have be 
remapped to a different memory area. Thus, 
pointers into the persistent storage area need 
to be recalculated or verified after calling this 
function. 


getdata()/setdata() returns/sets the offset to 
the root memory area of the container’s data 
structure. Using getdata(), the persistent 
containers gain access to pre-existing elements 
at the container’s construction time. 
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5 Evaluation 


We evaluated PSTL in terms of compatibility by 
adapting some sample applications we had written 
previously to be used by the containers provided by 
PSTL instead of those provided by STL. Though 
PSTL has not yet been tuned for performance, we 
will also give a short performance evaluation to give 
the reader a rough estimate of PSTL’s current per- 
formance. 


5.1 Compatibility 

The main difference from STL is the extension to 
allow the user to supply a custom serialization func- 
tionality. While we have included some serialization 
classes for common value types, it is likely that the 
user will have to supply a custom serialization trait 
for his own value types. Additionally, the filename 
of the container needs to be specified when instan- 
tiating a PSTL container. 


vector<int> v(); // STL vector 

pvector<int> pv("/tmp/vector"); // PSTL vector 

pvector<int ,pallocator,myserializer> 
pv2("/tmp/vector2"); // use my serializer 


When converting our test applications from using 
the containers provided by STL to the ones pro- 
vided by PSTL, we identified that users typically 
make assumptions about the implementation of the 
containers. A common assumption is that the it- 
erator of a vector is implemented as a pointer or 
that a reference is implemented as a C++ refer- 
ence. While this works in combination with most 
STL implementations, it is not compliant with the 
standard [ISO98]. 


T* i=pv.begin(); // error 
pvector<T>::iterator j=pv.begin(); // ok 
T& r=pv.front(); // error 
pvector<T>::reference s=pv.front(); // ok 


As we have explained in Section 4.3, PSTL uses its 
own wrapper class as a reference. Fortunately, this 
is identified by the compiler and thus can be cor- 
rected by the user easily. The same applies to ref- 
erences returned by PSTL. 


PSTL’s non-constant references copy their elements 
back to disk, regardless whether they have been 
modified or not. This is not the case for constant 


references as they must not be changed by defini- 
tion and thus do not have to be written back to 
disk. Hence, if performance is of importance, one 
should distinguish between member functions re- 
turning constant (e.g., front() const) and non- 
constant references (e.g., front () ()). 


C++ resolves the member function to be called 
based on its function name and the arguments in- 
cluding the implicit this argument. Only in case 
of a constant object pointer/reference, the constant 
method will be chosen. The following code clarifies 
this. 


pvector<int> pvi("/tmp/filename") ; 
const pvector<int> &pv2=pv1; 


/+**e* ok, but inefficient +*++*+/ 
pvector<int>::reference ref= 
pvi.front(); // call ref begin() 
pvector<int>::const_reference cref1= 
pvi.front(); // call ref begin() and 
// convert ref to const_ref 


/***% ok, and efficient ****/ 
pvector<int>: :const_reference cref2= 
pv2.front(); // call const_ref begin() const 


A set of polymorphic elements is typically stored 
by parameterizing the container with the pointer to 
the base class (<base_type*>) and allocating the 
elements on the heap. Achieving the same behav- 
ior with PSTL requires the user to specify a custom 
serialization trait for base_type*. Serialization is 
straightforward by calling the appropriate serializa- 
tion function. To allow for extensibility the deseri- 
alization function should be implemented using ex- 
emplar types [Cop92] (sometimes also referred to as 
virtual constructors). 


If the performance of manipulating persistent el- 
ements is of major concern the user may spec- 
ify his own reference type within the serialization 
trait. This allows the reference type to gain access 
to PSTL’s data allocator and manipulate the data 
structure in the persistent storage area directly. 


Even though, in case of concurrent access to the con- 
tainers, PSTL requires to lock and unlock the con- 
tainers explicitly, this is not a compatibility prob- 
lem. STL implementations do not implement per- 
sistent containers and thus do not allow simultane- 
ous access. In case a program should be retrofited 
to allow multiple processes to access the same data 
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structure, it is necessary to review the code for pos- 
sible deadlock situations anyway. In future versions 
of PSTL, however, we will try to implement a trans- 
parent locking facility as mentioned in Section 4.4. 


5.2 Performance 


So far, our efforts on PSTL focused on compatibil- 
ity with STL and not on its performance. How- 
ever, we were still curious to see how it would per- 
form in comparison to Berkeley DB [Sle00] (with 
logging for recovery or transactions disabled) and 
gdbm [GND99], the leading persistent container li- 
braries available for Unix. 


The computer used for this benchmark was a 
Pentium II (350MHz), 256MB of RAM, and a 
Seagate 4GB hard drive (ST34323A). The com- 
puter was running RedHat 6.1 (Linux kernel 2.2.12, 
glibc 2.1) and all container libraries and test ap- 
plications were compiled using gcc-2.95.2 using -02 
for optimization. Each application was executed 
5 times and the median was chosen for our perfor- 
mance comparison. 


Due to the lack of available benchmarks for the eval- 
uation of persistent container libraries, we have im- 
plemented our own applications: an address book 
mapping family names to addresses and phone num- 
bers and a resource reservation system. Both appli- 
cations are based on associative containers with the 
difference that the resource reservation system uses 
a simple object as key and thus favoring PSTL. In 
the following, however, we will limit our discussion 
to the address book application (Table 1). The re 
sults of the resource reservation system are available 
from the PSTL web site together with the source 
code of the benchmarks. 


148397 entries 
PSTL BDB 


59840 entries 
PSTL BDB- gdbm 








Database 








Insertion 31.65 45.28 
Iteration 10.30 20.28 
Lookup 5.86 16.00 
Deletion 482.19 27.58 


Table 1: Address Book Benchmark (seconds) 


Table 1 shows the results for two different database 
sizes. One with 59840 entries without duplicates 
since gdbm does not support duplicate keys and one 
with 148397 entries with duplicates. /nsertion refers 
to inserting all the elements, iteration to iterating 
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over all of the elements 10 times, lookup looking up 
each element, and deletion to removing all elements 
one after the other. 


Astonishingly, in many cases PSTL performs better 
than its competitors. We assume that one of the 
reasons is PSTL’s use of memory mapped files. 


Typically, Berkeley DB does not use memory 
mapped files since this would restrict the size of the 
database to the size of the address space. With the 
advent of 64bit computers, however, we do not be- 
lieve this to be a problem for PSTL. Berkeley DB 
only uses memory mapped files for databases opened 
read only and smaller than a given threshold. If the 
smaller Berkeley DB database is opened read only, 
iterating over the elements takes 6.43 seconds and 
2.03 seconds for looking them up. This does not 
show much potential for performance improvement 
of PSTL here. It is interesting to note that Berke- 
ley DB scales better for inserting new elements and 
PSTL scales better for iterating over elements and 
looking them up. This might be due to the fact 
that PSTL uses a red-black tree and Berkeley DB a 
B-tree. 


GNU gdbm uses a hash table for its internal repre 
sentation. This also gets clear by looking at its fast 
lookup speed, even though it uses normal disk I/O. 
Iteration over the elements is poor since gdbm only 
returns the key when iterating over the container re- 
quiring another lookup for the associated value. If 
the key is sufficient, the time for iterating over the 
elements is just 4.05 seconds. 


PSTL’s performance, however, for deleting the el- 
ements shows plenty of potential for improvement. 
This is due to the fact that PSTL uses a linear list 
for the management of free blocks sorted by their 
location within the file. Deleting all of the elements 
one after the other in random order populates the 
list with a huge number of elements even though 
adjacent free blocks are merged immediately. This 
problem will be attacked in future versions. 


6 Related Work 


Currently, several persistent container class libraries 
are available. The most prominent are the various 
*dbm databases. 
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6.1 dbmand Variants 

dbm is one of the oldest libraries that provide 
access to persistent containers. Under Unix, 
dbm [Unia] and its newer variants (ndbm [Unib], 
gdbm [GND99], Berkeley DB [Sle00]) are a de-facto 
standard for persistent information storage. 


The most advanced of the dbm databases is Berke- 
ley DB. Unlike the older variants, it does not only 
provide a hash table for its internal representation, 
but also a B-tree and two different kinds of queue 
formats as well as optional transaction management. 
Additionally, it provides bindings for C, C++, Java, 
and Tcl. 


Berkeley DB requires the elements to be stored in 
the database to be serialized before the database al- 
locates the element’s memory forcing the element 
to be copied twice. Additionally, the serialization 
function does not have access to Berkeley’s memory 
allocator and is forced to store the whole element in 
one chunk. On the other hand, this approach en- 
sures that the database cannot be easily corrupted 
by the serialization algorithm. 


Even though Berkeley DB provides C++ bindings, 
it provides a very low level API like the older dbm 
variants. Most of its functions work on DBTs (Data 
Base Thangs) representing raw regions of memory. 
Whenever a key/value pair needs to be stored in a 
database, both key and value need to be converted 
into a DBT. Additionally, when a value correspond- 
ing to a key needs to be requested, the key needs to 
be converted into a DBT and the database returns 
a DBT representing the value. 


PSTL, however, has an API compatible with STL 
and the serialization function is part of the con- 
tainer’s type. Besides that, the serialization func- 
tion is called transparently and the user does not 
have to deal with it. Based on our work on PSTL 
it might be fairly simple to provide a C++ API for 
Berkeley DB which is compatible with STL too. 


6.2 Disk Based Container 


The disk containers presented in [Nel98] use the 
same approach as used by the *dbm databases to 
access elements stored in its containers. Similar to 
PSTL, the elements in the database are referenced 
by offsets instead of pointers. 


Unfortunately, this work seems to be more an ex- 
periment on how persistency could be achieved in 


C++ without going into much detail. For instance, 
the management of the free blocks in the container 
is overly simple and only provides a fixed-size block 
allocation scheme. This makes it difficult and inef- 
ficient to use these containers in combination with 
variable sized elements. The library is also incom- 
patible with the containers provided by STL. 


6.3 Persistent Template Library 


The Persistent Template Library is presented in 
[Ste98b] and [Ste98a]. PTL is a library that pro- 
vides containers compatible with the ones provided 
by STL. Unfortunately, the containers have some 
severe drawbacks: 


e Whenever a PTL container is allocated it is 
copied from disk into main memory. After- 
wards it behaves like the STL containers. This 
is trivial since it uses the same code. When the 
container is destroyed, its data is written back 
to disk. 


Due to the above, the size of the container is 
limited to the size of the main memory. A per- 
sistent container, however, should be able to 
accommodate more elements than fit. into the 
memory. This demand is essential for programs 
that have to manage huge amounts of data. 


Only one process may instantiate a container 
at one time since the container is copied into 
main memory and other processes will see the 
changes only when it is saved back to disk at 
destruction. 


This work is interesting because the author comes 
to the conclusion that it is not possible to use the 
allocator mechanism provided by STL to add per- 
sistency. The author has solved the problem by 
subclassing the corresponding container provided by 
the STL and by supplying his own constructors for 
reading the elements from disk and destructor for 
writing them back. This, unfortunately, led to the 
above disadvantages. 


6.4 POST++ 


While POST++ does not provide persistent con- 
tainers directly, it provides a simple and effective 
storage for application objects [Kni99] and supports 
the use of different storages for different objects. 
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Except for a slight instrumentation of the data 
structures to be stored persistently, POST++ trans- 
parently manages the persistence of the objects. 
For the instrumentation of the data structures, 
POST++ uses C preprocessor macros that regis- 
ter the attributes to be made persistent. A special 
macro is provided for the identification of pointers as 
their management is more complex. Due to the ex- 
plicit identification of pointers, POST++ even pro- 
vides garbage collection to reclaim unused storage. 


POST++ uses a different but nevertheless interest- 
ing approach. The choice whether to use PSTL or 
POST++ depends largely on the application do- 
main. For instances, POST++ does not provide 
persistent containers per se, it only provides the in- 
frastructure to make your objects (including con- 
tainer objects) persistent. Thus, if special purpose 
data structures need to be made persistent and their 
instrumentation is possible, POST++ might be a 
good choice. Otherwise, we recommend the use of 
PSTL. 


6.5 Texas 


Texas [SKW92] is a persistent storage system sim- 
ilar to POST++ but instead of requiring the user 
to instrument the data structures, it uses a pointer 
swizzling technique in combination with runtime 
type descriptors and slightly modified heap alloca- 
tion routines. Runtime type descriptors are gener- 
ated using an optional feature of gcc. 


Objects are either allocated on the conventional 
(transient) heap, or the persistent heap. In the im- 
plementation presented in [SKW92], all the objects 
allocated on the persistent heap are stored within a 
single file, but the authors claim that it would not 
be difficult to remove this restriction. 


While Texas provides a simple and powerful way to 
manage data structures located on persistent stor- 
age, it has some shortcomings with regards to shar- 
ing the persistent database. If several different pro- 
cesses need to share the same database, they need 
to share the same persistent page mappings. While 
this might be simple in case of a single file used for 
all persistent objects, it cannot be easily achieved if 
different files are used for different persistent objects 
and the files to be shared between the processes (or 
the processes sharing the files) cannot be known a 
priori. 
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7 Future Work 


For the maintenance of free blocks, PSTL uses a lin- 
ear list, sorted by the memory location of the free 
blocks. This representation is highly efficient for 
debugging purposes since it allows an efficient way 
to check the allocation status of the whole file and 
whether the free list has been corrupted (e.g., by a 
method writing past its allocated memory area). As 
we have shown in Section 5.2, however, it is ineffi- 
cient from a performance point of view. To alleviate 
this problem, we plan to provide a better allocator 
class using a binary tree for the management of free 
blocks. 


PSTL does not yet implement all the containers 
efficiently. It might be interesting to see whether 
PSTL could profit from a data structure opti- 
mized for block-sized access patterns like a B or 
B+tree [Com79] or whether the operating system’s 
cache management is sufficient. 


We also plan to provide an associative container op- 
timized for lookups using a hash table. It might also 
be interesting to see whether gdbm’s algorithms can 
be reused for this container to achieve not only inter- 
operability with STL, but also with the most widely 
used type of persistent container. 


In future versions of PSTL we will also try to imple- 
ment a transparent locking mechanism as explained 
in Section 4.4. 


8 Conclusions 


Persistent STL (PSTL) containers provide a vari- 
ety of benefits not available with existing persis- 
tent container implementations available for C++. 
They provide a variety of different persistent con- 
tainers, allowing the container’s size to grow beyond 
the available memory, and supporting STL’s object- 
oriented programming model known to many C++ 
programmers. 


In this paper we have shown the problems of adding 
persistency to STL and demonstrated why it is im- 
possible to use STL’s allocator mechanism to add 
persistency to existing STL containers. This im- 
possibility forced us to implement our own con- 
tainer classes that are interface compatible with 
STL. Based on this implementation, we have pre- 
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sented how STL’s design had to be extended to sup- 
port persistency. 


Based on PSTL, we explained how to implement 
a transparent serialization facility without break- 
ing interface compatibility with the corresponding 
STL containers and without requiring any support 
from the objects to be stored in the persistent con- 
tainer. We solved this using a traits approach that 
encapsulates the serialization functionality and al- 
lows different containers to use different serialization 
algorithms. 


Except for the declaration of the container, ele- 
ments are serialized and deserialized transparently. 
This has been solved using a reference wrapper that 
works in combination with both objects using vir- 
tual and non-virtual member functions. Addition- 
ally, the persistent allocator class used by PSTL’s 
containers provides a simple but efficient means to 
create new persistent container classes for special 
purpose applications. 


In an evaluation of the library we identified that 
STL containers can easily be replaced with PSTL 
containers. Only little modifications are necessary 
in case of implicit assumptions about the container’s 
implementation. This, however, is detected by the 
C++ compiler. We have also included a perfor- 
mance evaluation which will serve as a basis for fu- 
ture improvements of PSTL. 
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Availability 


PSTL is distributed under the GNU Public License 
and is available at http://www.infosys.tuwien.ac.at /- 
NewsCache /pstl.html. 
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Abstract 


Today, mobility and persistence are important aspects 
of distributed applications. They have many fields of 
use such as load balancing, fault tolerance and dynamic 
reconfiguration of applications. In this context, the Java 
virtual machine provides many useful services such as 
dynamic class loading and object serialization which 
allow Java code and objects to be mobile or persistent. 
However, Java does not provide any service for the 
mobility or the persistence of control flows (threads), 
the execution state of a Java program remains 
inaccessible. 

We designed and implemented new services that make 
Java threads mobile or persistent. With these services, a 
running Java thread can, at an arbitrary state of its 
execution, migrate to a remote machine or be 
checkpointed on disk for a possible subsequent 
recovery. 

Therefore migrating a Java thread is simply performed 
by the call of our go primitive, by the thread itself or by 
an external thread. In other words, the migration or the 
checkpointing of a thread can be initiated by the thread 
itself or by another thread. 

We integrated these services into the JVM, so they 
provide reasonable and competitive performance 
figures without inducing an overhead on JVM 
performance. Finally, we experimented a dynamic 
reconfiguration tool based on our mobility service and 
applied to a running distributed application. 
Keywords: mobility, persistence, 

checkpointing, recovery, Java, thread, JVM 


migration, 


1. Introduction 


Today, mobility and persistence are important 
aspects of distributed applications and have several 
fields of use [Milojicic99] [Ambler99]. Application 
mobility can be used to dynamically balance the load 
between several machines in a distributed system 
(Nichols87], to reduce network traffic by moving 
clients closer to servers [Douglis92], to dynamically 


reconfigure distributed applications (Hofmeister93], to 

implement mobile agent platforms (Chess95] or as a 

machine administration tool [Oueichek96]. Application 

persistence can be used for fault tolerance [Wojcik95] 
or for application debugging. 

Distributed applications development is an 
important research direction in computing systems. In 
this context, the object paradigm has proven to be well 
suited to distributed applications development and the 
Java Virtual Machine (JVM) is now considered as a 
teference platform [Gosling96]. The Java compiler 
produces bytecode, an intermediate code that is 
interpreted by the JVM. Today, the JVM is ported on 
almost every platform and can therefore be viewed as a 
universal machine. 

In order to facilitate the development of 
distributed applications, the JVM _ provides several 
services [Sun00a] among which: 

e Object serialization. The serialization service 
allows the transfer of Java objects between several 
nodes or the storage of objects on disk. 

e Dynamic class loading. The dynamic class loading 
service enables the transfer of Java code between 
several nodes. 

Therefore, Java provides useful services for the 
mobility and the persistence of code and data. However, 
Java does not provide any service enabling the mobility 
and the persistence of applications during their 
execution. Thus, if a running Java application migrates 
to a new location, only using object serialization and 
dynamic class loading, the execution state of the 
application is lost. In other words, when arriving on its 
new location, the migratory application can access to its 
code and its re-actualized data but it has to restart the 
execution from the beginning. Consequently, the 
provided Java services are not sufficient for either 
enabling dynamic load balancing of distributed Java 
executions or allowing the state of running applications 
to be checkpointed and then recovered. 

We designed and implemented new services that 
make Java threads, i.e. executions, mobile or persistent. 
With these services, a running Java thread can, at an 
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arbitrary state of its execution, migrate to a remote 
machine or be checkpointed on disk for a possible 
subsequent recovery. 

Our java.lang.threadpack Java package provides 
several primitives, among which go performs thread 
migration, store is used for thread checkpointing and 
load for thread recovery. Therefore migrating a Java 
thread is simply performed by the call of the go 
primitive, by the thread itself or by an external thread. 
In other words, the migration or the checkpointing of a 
thread can be initiated by the thread itself or by another 
thread. 

We integrated these services into the JVM, so they 
provide acceptable performance figures without 
inducing an overhead on JVM performance. Finally, we 
experimented with a prototype implementation a 
dynamic reconfiguration tool based on our mobility 
serviceand applied to a running distributed application. 

The rest of this paper consists of three main parts. 
We first describe our service for capturing/restoring 
Java thread state in section 2 and then present the 
services of thread mobility and thread persistence in 
section 3. In sections 4 and 5, we respectively present 
performance figures and describe some experiments 
that we performed with our services. Finally, we 
discuss related work and present our conclusions and 
future directions in section 6 and 7. 


2. Thread state capture/restoration service 


Both services allowing the mobility and the 
persistence of Java threads are based on a common 
service: a thread state capture/restoration service. We 
first describe the representation of a thread state in the 
JVM and then present the design principles of our 
capture/restoration service and its implementation 
details. 


2.1. Java thread state 


The JVM can support the concurrent execution of 
several threads [Lindholm96]. The state of a Java 
thread is illustrated by figure 1, it consists of three main 
data structures: 

e The Java stack. A Java stack is associated with 
each thread in the JVM. The Java stack consists of 
a succession of frames (see figure 2). A new frame 
is pushed onto the stack each time a Java method is 
invoked and popped from the stack when the 
method returns. A frame includes the local 
variables of the associated method and the partial 
results of this method. The values of local variables 
and partial results may be of several types: integer, 
float, Java reference, etc. A frame also contains 


registers such as the program counter and the top of 
the stack. 


e The object heap. The heap of the JVM includes all 


the Java objects created during the lifetime of the 
JVM. The heap associated with a thread consists of 
all the objects used by the thread (objects 
referenced in the thread’s Java stack). 


e The method area. The method area of the JVM 


includes all classes (and their methods) that have 
been loaded by the JYM. The method area 
associated with a thread contains the classes used 
by the thread (classes whose some methods are 
references by the thread’s Java stack). 


Method area 
Figure 1: Java thread state 


Java stack Object heap 


Java stack 
Figure 2: Java frame 





2.2. Design of the capture/restoration service 


Here are the design principles and the design 
decisions of our Java thread state capture/restoration 
service. 


2.2.1. Design principles 


The thread state capture/restoration service 
enables, on the one hand, the capture of the current 
state of a running thread, and on the other hand, the 
restoration of a previously captured state in a new 
thread: the new thread starts running at the point at 
which the execution of the previous thread was 
interrupted. 

Thread state capture consists in interrupting the 
thread during its execution and extracting its current 
state. The extraction amounts to build a data structure (a 
Java object) containing all information necessary for 
restoring the Java stack, the heap and the method area 
associated with the thread. To build such a data 
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structure, the Java stack associated with the thread is 
scanned in order to identify the objects and the classes 
that are referenced from the stack. After state capture, 
the resulting data structure can be serialized and sent to 
another virtual machine in order to implement mobility 
or it can be stored on disk for persistence purpose. 

One of our motivations was to provide a generic 
service which allows the implementation of various 
capture policies. Consequently we rely on Java object 
serialization and dynamic class loading features in 
order to capture respectively the heap and the method 
area. 

The restoration of a thread consists first in 
creating a new thread and initializing its state with a 
previously captured state. After that, the Java stack, the 
heap and the method area associated with the new 
thread are identical to those associated with the thread 
whose state was previously captured. Finally, the new 
thread is started, it resumes the execution of the 
previous thread. 


2.2.2. Design decisions 


There were mainly two problems for designing a 
Java thread state capture/restoration service. The first 
issue is to have access to the state of Java threads, a 
state that is internal to the JVM. The second issue is 
that the state of Java threads is not portable on 
heterogeneous architectures. 


2.2.2.1. Non accessible state 


The state of Java threads is internal to the JVM. 
This state is not accessible by Java programs and can 
therefore not be directly captured. In order to allow the 
capture of threads state, we extended the JVM and 
externalized the state of Java threads. 


2.2.2.2. Non portable state 


Unlike the heap and the method area that consist 
of information portable on heterogeneous architectures 
(Java objects and Java classes), the Java stack is a 
native data structure (C structure). The representation of 
the information contained in the Java stack depends on 
the underlying architecture. The thread state capture 
service must translate this non portable data structure 
(C structure) to a portable data structure (Java object). 

Translating the Java stack into a portable data 
structure consists more precisely in translating the 
native values of local variables and partial results 
(figure 2) into Java values. This translation requires the 
knowledge of the types of the values. But the Java stack 
does not provide any information about the types of the 
values it contains: a four bytes word may represent a 
Java reference as well as an int value or a float value. 
The thread state capture service must recognize the 


types of the values contained in the Java stack. We 

propose two approaches for type recognition: 

e Type recognition at runtime. The first approach 
consists in recognizing the types during runtime. 
The type information is built in parallel with thread 
execution. The type information is actualized each 
time a bytecode instruction is interpreted by the 
thread because the bytecode instructions are typed 
and are applied to particular types [Lindholm96}. 
Therefore, at state capture time, the type 
information is available. The drawback of this 
approach is that it induces an overhead on 
application performance. 

e Type recognition at capture time. The second 
approach consists in recognizing the types at 
capture time. The type information is built by 
analyzing the bytecode to determine the execution 
path of the thread. This analyze is similar to the 
bytecode verifier algorithm [Lindholm96]. This 
approach avoids any overhead on application 
performance but causes a latency due to type 
information building. 

We first implemented the approach based on type 
recognition at runtime [Bouchenak00] and then 
implemented the approach based on type recognition at 
capture time’. In this paper, we focus our attention on 
the design principles of our services and do not tackle 
the implementation details. 


2.3. Implementation of the 


capture/restoration service 


Our Java thread state capture/restoration service 
was integrated to the Java2SDK (formerly called 
JDK 1.2) [Sun00a]. Our new Java package, called 
java.lang.threadpack, provides many classes such as 
the ThreadState class whose instances represent the 
state of Java threads and the ThreadStateManagement 
class that provides the necessary features for capturing 
and restoring Java threads state. 

Figure 3 illustrates a part of the application 
programming interface (API) of the 
ThreadStateManagement class. The capture method 
allows the capture of the current state of a Java thread, 
the captured state is returned as a result of this method, 
as a ThreadState object. Symmetrically, the restore 
method creates a new Java thread, initializes its state 
with the ThreadState argument, starts the new thread 
and returns it as a result of the method. The new thread 
resumes the execution of the thread whose state was 


* The evaluation presented in section 4 concerns the 
first approach 
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previously captured and passed as an argument of the 
restore method. 


java.lang.threadpack 

Class ThreadStateManagement 

public final class ThreadStateManagement extends 
Object 

The ThreadStateManagement class provides several 
useful services for the capture and restoration of Java 
thread states. 


capture(Thread thread) 
Captures the state of the thread 


argument and retums it as a 
ThreadState object. 
Testore(ThreadState threadState) 

Creates a new Java thread, 
initializes it with a previously 
captured state and starts its 
execution. 
¢aptureAndSend(Thread thread, 

Sendinterface snditf, 

boolean toStop) 

Captures the state of the thread 
argument and sends it (to a remote 
node or to the disk) by calling the 
sendState method of the 
Sendinterface interface. 
receiveAndRestore( 

Receivelnterface rcvItf) 

Receives the state of a Java 
thread by calling the receiveState 
method of the Receivelnterface 
interfiace, creates a new Java 
thread, initializes it with the 
received state and starts its 
execution. 


staticThreadState 


static Thread 


Static void 


static Thread 


Figure 3: Interface of the 
capture/restoration service 





The captureAndSend and_ receiveAndRestore 
methods are generic and can specialize the capture and 
restoration operations to application needs. Besides 
capturing the state of a Java thread, the captureAndSend 
method allows the programmer to specify the way the 
captured state is handled: the captured state can for 
example be sent to a remote machine for a mobility 
purpose, it can be stored on disk in the case of 
application persistence, etc. The specialization of the 
handling of the captured state is specified by the second 
argument of the captureAndSend method. In fact, this 
argument implements our SendInter face interface and 
so provides a sendState method that is called by our 
captureAndSend method, just after the capture of the 


thread state. The third argument of the captureAndSend 
method is a boolean that specifies if the thread whose 
state is captured is stopped or resumed. This argument 
is for example set to true in the case of thread migration 
and is set to false for remote thread cloning. 

Symmetrically, the receiveAndRestore method 
specifies the way a thread state is received before it is 
restored: the state can for example be received from a 
remote machine, it can be read from disk, etc. The 
specialization of the way the thread state is received is 
possible thanks to the argument of the 
receiveAndRestore method: this argument implements 
our Receivelnterface interface and so provides a 
receiveState method that is called by our 
receiveAndRestore method, just before the restoration 
operation. 


3. Thread mobility and thread persistence 
services 


Besides our system service for capturing/restoring 
the state of Java threads, we provide higher-level 
services for the mobility and the persistence of Java 
applications. 

Making an application mobile is the action of 
moving an application, during its execution, from one 
node to another: the application starts running, on the 
new node, at the point at which the execution was 
interrupted on the first node. Therefore, making a Java 
application mobile consists in making the underlying 
Java thread mobile. In the case of a multi-threaded 
application, the whole group of threads has to be 
moved. 

Making a thread mobile is the action of capturing 
the current state of the thread, sending this state to a 
target machine and restoring the state in a new thread 
on the target machine: the new thread resumes the 
execution in the state left by the original thread. 

In the same way, application persistence consists 
first in saving the current state of a running application 
on stable storage (disk). The saved state can 
subsequently be restored in order to resume the 
execution of the application. Therefore, making a Java 
application persistent consists in making the underlying 
Java thread(s) persistent. 

Making a thread persistent is, first, the action of 
capturing the current state of the thread and saving it on 
disk and then, the ability of restoring the saved state in 
a new thread: the new thread resumes the execution of 
the previous thread. 

Our MobileThreadManagement class belongs to 
the java.lang.threadpack package and provides services 
necessary for the mobility of Java threads. Figure 4.a 
illustrates a part of the application programming 
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interface of this class. The go method allows the 
transfer of a running Java thread to a Java virtual 
machine identified by an IP address and a port number. 
And the arrive method enables the reception of a Java 
thread coming from a machine specified by an IP 
address and a port number. 


java.lang.threadpack 

Class MobileThreadManagement 

public final class MobileThreadManagement extends 
Qbject 

The MobileT hreadManagementclass provides 


String tagetHost, 

int targetPort) 

Moves the execution of the 
thread argument to the machine 
specified by the host name and the 
port number arguments. — 
arrive( String sourceHost, 

int sourcePort) 

Receives a thread from the 
machine specified by the host 
name and the port number 

uments. 


static Thread 


Figure 4.a: Service for the mobility of Java 
threads 


public static void go(Thread thread, String targetHost, 
int targetPort) { 


MySender sndltf= new MySender(targetHost, 
targetPort) ; 

ThreadStateManagement.captureA ndSend(thread, 
sndItf) ; 


public static Thread arrive (String sourceHost, 
int sourcePort) { 


MyReceiver rcvltf= new MyReceiver(sourceHost, 
sourcePort) ; 

Tetum 

ThreadStateManagement.receiveAndRestore(rcvItf) ; 


} 


Figure 4.b: Implementation of mobility service 





The go and arrive methods are implemented using 
respectively the captureAndSend and 
receiveAndRestore generic methods (see section 2.3). 
The go method is implemented as follows: 

e The go method calls the captureAndSend method 
(figure 4.b). 

e The captureAndSend method is adapted using an 
instance of the MySender class. 

° The MySender class implements the 
SendStatelnterfiace interface and therefore provides 
a sendState method, this method aims at 
establishing a connection to a machine and sending 
the ThreadState object using _ serialization 


(figure 4.c). 


class MySender 
implements SendInterface { 


String host ; 
int port ; 


MySender(String host, int port) { 
this.host = host ; 
this.port = port ; 

} 


public void sendState(ThreadState state) { 
// Send state to <host, port>. 


class MyReceiver 
implements Receivelnterface { 


String host ; 
int port ; 


MyReceiver(String host, int port) { 
this.host = host ; 
this.port = port ; 


} 


public ThreadState receiveState () { 
// Receive a state from <host, port> and retum 


Figure 4.c: Implementation of mobility service 
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The arrive method is implemented as follows: 

e The arrive method calls the receiveAndRestore 
method (figure 4.b). 

e The receiveAndRestore method is adapted using an 
instance of the MyReceiver class. 

e The MyReceiver class implements the 
ReceiveStatelnterface interface and_ therefore 
provides a receiveState method, this method aims 
at establishing a connection to a machine and 
receiving a ThreadState object using de- 
serialization (figure 4.c). The classes associated 
with this ThreadState object are received relying 
on the Java dynamic class loading service. 

We can also imagine go and arrive methods that 
rely on the Wireless Application Protocol instead of IP 
in order to perform thread migration between JVM 
installed on wireless hosts [WAPFactory00]. 


java.lang.threadpack 

Class PersistentThreadManagement 

public final class PersistentThreadM anagement 
extends Object 

The PersistentThreadManagementclass provides 
several useful services for making Java threads 
persistent. 


Method Summar: 


Static void | stoce(Thcead thread, 
String fileName) 


Saves the state of the thread 
argument in the file specified by 
the name argument. 
load(String fileName) 

Restores the execution of a Java 
thread from the state stored in the 
file specified by the name 
argument. 


static Thread 


Figure 5: Service for the persistence of Java 
threads 





In the same way, the 
PersistentThreadManagement provides several services 
for the persistence of Java threads. A part of its 
application programming interface is illustrated by 
figure 5. The store method saves the current state of a 
Java thread in a file specified by a name and the load 
method restores a Java thread from a state saved in a 
file identified by a name. These two methods are also 


implemented using our  captureAndSend and 
receiveAndRestore generic methods. 
Finally, the MobileThreadManagement and 


PersistentThreadManagememt classes are two possible 
adaptations of our generic service of Java thread state 
capture/restoration. In the same way and for a particular 





application, our generic service can be adapted to build 
tools that meet application's needs. 


4. Evaluation 


This section first presents the performance figures 
of our thread state capture/restoration service. The cost 
of migrating a Java thread between two machines and 
the cost of checkpointing/recovering a thread are then 
presented. Finally, a comparison between the results of 
benchmarking our extended JVM and the standard JVM 
is described. Our evaluation environment is as follows: 

e JDK 1.2.2, 

e Solaris 2.6, Sun Ultra-I 
167 MHz), 

e Ethernet 100Mb/s. 


(Sparc Ultra-I 


4.1. Basic costs 


The time spent in capturing/restoring a Java thread 
state depends on the size of the state at capture time. 
The size of a Java thread depends on the number and 
the size of the frames pushed onto the Java stack 
associated with the thread. In the following, we focus 
our attention on the influence of the number of frames 
on the cost of our services. In order to vary the number 
of frames pushed onto thread's Java stack, we used a 
recursive program (the factorial function). 


e 
= 
o 
E 
- 


Number of frames 
| —e—Capture 


Figure 6: Thread state capture 


Figure 6 describes the variation of the cost of a 
thread state capture operation according to the number 
of frames on thread's Java stack at capture time. The 
cost of a capture operation is less than I ms when the 
number of frames is lower than 10. This cost reaches 
2ms when the number of frames is 20 and 9 ms when 
the number of frames is 80. 

Figure 7 presents the cost of a thread state 
restoration operation when varying the number of 
frames on thread's Java stack at capture time. The curve 
shows that the cost of a restoration operation is less 
than | ms whenthe number of frames is lower than 80. 
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40 60 
Number of frames 


—e— Restorabon 


Figure 7: Thread state restoration 


Finally, the costs of the capture and the restoration 
of Java thread's state are acceptable, especially in the 
case of threads with few frames on the Java stack. 


4.2. Evaluation of thread migration 


We measured the cost of a Java thread migration 
based on our thread mobility service. In figure 8, the 
solid curve represents the variation of the cost ofa Java 
thread migration operation according to the number of 
frames on thread's Java stack at migration time. The 
dotted curve represents the cost of a thread state 
transfer between two machines when varying the 
number of frames on thread's Java stack. 


20 40 60 80 
Number of frames 


[—e—wisration -- &--- State transfer] 


Figure 8: Thread migration 


The cost of a thread migration linearly varies from 
100 ms to 600 ms when the number of frames on the 
thread's stack is between | and 100. This cost may seem 
significant but it is mainly due to the cost of thread state 
transfer, as shown by the two almost superimposed 
curves. In fact, thread migration consists in capturing 
thread's state, sending this state to a destination 
machine and restoring the state in a new thread on the 
destination machine. So state transfer represents 98% of 
the total cost of thread migration. 








On the other hand, the transfer of a thread state to 
a destination machine consists in first serializing the 
state object in order to translate the object graph to a 
byte array, then transferring the resulting array of bytes 
over the network to the destination machine and finally 
de-serializing the byte array on the destination machine 
in order to rebuild the object graph. The state transfer 
time can partly be reduced using Java externalization 
rather than serialization. Externalization allows the 
application programmer to write its own object transfer 
policy by only saving information necessary for 
rebuilding object graphs. Externalization may be until 
40% faster than serialization [Sun00c]. 


4.3. Evaluation of thread checkpointing and 
recovery 


Besides thread migration, we also measured the 
cost of checkpointing a running Java thread and saving 
its state on disk and the cost of recovering an execution 
from a state previously stored on disk. In figure 9, the 
solid curve represents the variation of the cost of a Java 
thread checkpointing operation according to the number 
of frames on the thread's Java stack at checkpointing 
time. The dotted curve represents the cost of writing a 
thread state on disk according to the number of frames 
on the thread's Java stack. In figure 10, the solid curve 
represents the cost of a Java thread recovery and the 
dotted curve illustrates the cost of reading a thread 
state from disk. 


Number of frames 


Figure 9: Thread checkpointing 


We notice, on the one hand, that the cost of thread 
checkpointing and the cost of thread recovery increase 
linearly when the number of frame on the thread's Java 
stack increases. On the other hand, 97% of the time of 
thread checkpointing is spent in writing thread's state on 
disk and 99% of the time of thread recovery is spent in 
reading thread's state from disk. As explained in 
section 4.2, the costs of serialization/de-serialization 
can be decreased by using externalization. The 
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performance of thread checkpointing can also be 
improved by performing asynchronous disk writing. 


40 6 
Number of frames 


[——=ireaa fecovery -~ * --State reading ] 


Figure 10: Thread recovery 





4.4. Benchmarking the JVM 


Our thread mobility and thread persistence 
services were integrated into the JVM. In order to 
evaluate the performance. of our extension of the JVM, 
we compared them to the performance figures of the 
standard JVM. We used two benchmarks: the 
Embedded CaffeineMark 3.0 general Java benchmark 
[Pendragon99] and the SciMark2.0 numeric Java 
benchmark [Pozo00]. In order to measure JVM 
performance, the benchmarks were run by disabling JIT 
compilation. 














Extended 
JDK 1.2.2 


Standard 
JDK 1.2.2 







Overall Score 1913 1913 
Sieve 1447 1457 
Loop 3314 3311 
Logic 5403 5265 
String 1922 1985 
Float 1130 1134 

___ Method 


Figure 11: Benchmarking the JVM with 
Embedded CaffeineMark 3.0 


Embedded CaffeineMark consists of 6 tests: 
finding prime numbers, loops, logic tests, String and 
Float tests and method calls. The overall score is the 
geometric mean of the individual scores, i.e., it is the 6 
root of the product of all the scores. The score for each 
test is proportional to the number of times the test was 
executed divided by the time taken to execute the test, 
i.e. a higher number represents a better score. Figure | | 
presents the results of benchmarking the standard 
JDK 1.2.2 and our extended JDK 1.2.2. It shows that 
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our extension does not induce any loss of performance 
on the JVM. 










Standard Extended 





Tests JDK 1.2.2 JDK 1.2.2 
Composite Score 8.9891 9.0977 
FFT (1024) 7.4484 7.6727 
SOR (100x100) 18.6662 18.7242 
Monte Carlo 0.9157 1.1166 
LU (100x100) 7.4484 7.6727 
Sparse matmult 7.7224 7.5683 
(N=1000, 
nz=5000) 
Figure 12: Benchmarking the JVM with 


SciMark 2.0 


SciMark 2.0 is a Java benchmark for scientific and 
numerical computing. It consists of five computational 
kernels: Fast Fourier Transform (FFT), Jacobi 
Successive Over-relaxation (SOR), Monte Carlo 
integration, dense LU matrix factorization and Sparse 
matrix-multiply. The kernels are chosen to provide an 
indication of how well the underlying JVM performs on 
applications utilizing these types of algorithms. The 
problems sizes are purposely chosen to be small in 
order to isolate the effects of memory hierarchy and 
focus on internal JVM and CPU issues. This benchmark 
Teports a composite score in approximate Mflops 
(Millions of floating point operations per second). 
Figure 12 shows the performance figures resulting from 
benchmarking the standard JDK 1.2.2 and our extended 
JDK 1.2.2. It shows that our extension does not induce 
any loss of performance on the JVM. 


5. Experimentation 


In this section, we describe two experiments that 
use our mobility service. The first experiment shows 
the usefulness of strong mobility and the second 
experiment shows how to build a dynamic 
reconfiguration tool on top of our mobility service. 
Finally, we discuss some implementation issues and 
solutions. 


5.1. Strong mobility: Mobile recursive Fractal 


Two degrees of application mobility can be 
distinguished: weak mobility and strong mobility 
[Fuggetta98]. With weak mobility, only data state 
information and application’s code are transferred. 
Therefore, on the new location, the mobile application 
has its actualized data but restarts execution from the 
beginning. With strong mobility, the code of the 
application and the state of data and execution are 
transferred: the application on the destination location 
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Tesumes the execution at the point where it was 

interrupted on the source location. 

The usage of weak or strong mobility depends on 
application’s needs. Let’s consider a recursive Java 
application. The recursive calls are translated by a 
succession of frames on the Java stack associated with 
the underlying thread. How is this application made 
mobile? 

e Weak mobility does not consider the state of 
execution (thread’s state), so frames previously 
pushed onto the Java stack are lost after the transfer 
and the execution restarts from the beginning. 

e Strong mobility captures the state of execution and 
allows the execution to be resumed after the 
transfer. 


TERRES Viasat? | 


Machine 1 


ESTES ESET S | 


Machine 2 


Migration 


DADEEEaMbMIAS Ss, 
E SSeS SES | 


Machine 3 


Figure 13: Mobile Fractal 





We considered a recursive graphical Java 
application: the Dragon fractal curve where a small 
dragon appears at a certain depth of recursion 
{Mandelbrot75]. We implemented a Java Dragon 
application and used our thread mobility service in 
order to move the application, when it is running, 
between several machines. Figure 13 illustrates this 
experiment. The Dragon application is first started on a 
first machine, then moved to a second machine where it 
resumes its execution and finally moved to a third 
machine where it finishes its execution. The transfer of 
the thread calculating the fractal is performed by an 
external thread that calls the go method of our 
MobileThreadManagement class. 


5.2. Dynamic reconfiguration: Mobile 7alk 


In this section, we describe how our mobility 
service can be combined with other Java services 
(object serialization, dynamic class loading) in order to 
build a dynamic reconfiguration tool. 

We consider a Talk application where two remote 
users exchange messages. Initially, each user starts an 
instance of the Talk application on its personal 
computer with a graphical user interface. Each user has 
two communication channels: an input channel to 
receive messages from the remote user and an output 
channel to send messages to the remote user. During the 
talk, one of the users decides to transfer its application 
to a minimal host with limited physical characteristics 
(a mobile phone for example) and to resume its 
execution. This dynamic reconfiguration of the Talk 
application is illustrated by figure 14 and has the 
following requirements: 

e Moving a running application from one host to 
another. 

e Handling communication channels during transfer. 

e Replacing the graphical user interface by a textual 
user interface when arriving on the destination host 
because of the limited physical characteristics. 


is wa 


Serialization 


‘Host 


Ins igration 


Dynamic 
class 
loading 


De-Serialization 


Host 3 


Figure 14: Mobile Talk 


To transfer the running mobile Talk application to 
a new host, our mobility service can be used: it takes 
the current state of the application into account. To 
transfer the application to a mobile phone, the mobility 
service must use the Wireless Application Protocol 
(WAP) [WAPfactory 00]. 

To tackle the problems of communication 
channels and user interface, the Java serialization and 
dynamic class loading services are adapted. In fact, our 
mobility service relies on both serialization and 
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dynamic class loading to respectively transfer the 

objects and the classes used by the application at 

migration time. These two features can be specialized 
as follows: 

e The serialization of the communication channels 
can be adapted in order to send a particular 
message to the remote user informing him about 
the next migration and then to close the 
connections. Symmetrically, the de-serialization of 
the communication channels can be adapted in 
order to recreate new channels and reestablish the 
connection with the remote user. 

e The dynamic class loading can be adapted in order 
to use a textual user interface rather than the 
graphical one on the mobile phone. 

Finally, the combination of our mobility service, 
the serialization and the dynamic class loading enables 
the building of a complete dynamic reconfiguration 
tool. This application has been experimented with a 
prototype implementation on our extended JDK 1.2.2. 
A port of our services to the K Virtual Machine, a 
lightweight JVM, is planned [Sun0OOb]. 


5.3. Discussion 


In this section, we discuss some _ issues 
encountered when implementing thread mobility and 
thread persistence. Let’s focus our attention on the 
mobility of a thread: 

e What happens if a thread moves from a source host 
to a destination host while it is using objects shared 
with other threads on the source host? 

e How are the communication channels connecting 
several threads handled when one of these threads 
moves to a new host? 

e What happens if a thread that belongs to a multi- 
threaded application move to a new host? 

We now tackle each of these issues and propose 
possible solutions. 

What happens if a thread moves from a source 
host to a destination host while it is using objects shared 
with other threads on the source host? A first solution 
consists in replicating the shared object and transferring 
it with the mobile thread [Garcia-Molina86]. In this 
case, the consistency of the replicas must be managed. 
Another solution to the problem of shared objects is to 
use proxies on the destination host in order to allow 
remote access to shared objects. A problem of object 
availability occurs if the source host crashes [Chou83]. 

How are the communication channels connecting 
several threads handled when one of these threads 
moves to a new host? A first approach consists in using 
proxies on the destination host in order to access the 
communication channels remotely. A problem of 


channel availability occurs if the source host crashes. 
Another approach consists in closing the channels on 
the source host and recreating them on the destination 
host. In this case, messages in transit must be redirected 
to the new location and the naming of the new channels 
must be actualized on other hosts. 

What happens if a thread that belongs to a multi- 
threaded application move to a new host? The thread 
can move alone to the destination host and 
communicate with the other threads remotely, or it can 
move with all the other threads or with a sub-set of 
them. 

Finally, for each of the discussed issues, the 
solution strongly depends on application's needs. That 
is why we deliberately chose not to impose a particular 
solution at the level of our thread mobility and thread 
persistence services. The programmer of the application 
is thus free to choose the more appropriate approach. 


6. Related work 


Many systems have been developed providing 
process mobility and persistence, considering either 
homogeneous or heterogeneous processor architectures. 
There are a number of surveys discussing these features 
[Milojicic97] [Deconinck93]. Both mobility and 
persistence of control flows are based on a mechanism 
that enables the capture and the restoration of 
executions’ state. Let’s focus our attention on such 
mechanisms in the Java environment. 

Three main approaches to address the problem of 
capturing/restoring the state of Java threads are 
distinguished: an explicit approach, an implicit 
approach based on a pre-processor of the application 
code and an implicit approach based on an extension of 
the JVM. 

In the first approach, which we call explicit 
management, the programmer of an application has to 
entirely manage the capture and the restoration of the 
state of his application. For this purpose, the 
programmer has to explicitly add supplementary code 
in fixed points of his program and usually has to 
manage his own program counter. The added code 
manages a backup object in which information relative 
to the state of the application is stored. The backup 
object is then used in order to restore the application 
execution. When restoring the state of the application, 
the first statement of the program is a branch to the 
point where the program must continue. This approach 
is not flexible and implies a modification of the 
application itself if new backup points are added. This 
approach is used in most of applications based on 
mobile agent platforms [Chess95] that provide weak 
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mobility, such as [IBM96] and Mole 

[Baumann98]. 

The two other approaches, which we call implicit, 
provide a transparent service for capturing/restoring 
thread state. The service is independent from the 
application code and is provided as a function that may 
be called by the application itself or by an external 
application. These two approaches differ by their 
implementation: 

e The first implicit approach consists in pre- 
processing the source (or byte) code of the 
application in order to insert statements. The 
inserted code attaches a backup object to the 
application. While the application is running, the 
backup object is re-actualized with the state of the 
application. When an application requires a 
snapshot of its state, it just has to use the associated 
backup object. In order to restore the execution 
state, data stored in the backup object are used to 
re-initialize the application in the same state as at 
snapshot time. This restoration is achieved by re- 
executing a different version of the application 
code (produced by the pre-processor) in order to 
rebuild the stack and re-initialize the local variables 
with the values stored in the backup object. The 
main motivation of this approach is that it does not 
modify the JVM. But its drawback is that it induces 
a significant overhead on application performance 
due to the inserted code, and on execution 
restoration which requires a partial re-execution of 
the application. The Wasp project provides a Java 
mobile agent platform based on a pre-processor 
which instruments the source code of Java 
applications [Fiinfrocken98]. Several 
implementations of Java thread mobility based on a 
pre-processor of the bytecode are proposed 
[Truyen00] [Sakamoto00]. 

e The second implicit approach consists in extending 
the JVM in order to make threads’ state accessible 
from Java programs. This extension must provide a 
facility for extracting the thread state and storing it 
in a Java object. The extension must also provide a 
facility for building a new thread initialized with a 
previously captured state. These facilities can only 
be used on extended virtual machines. We 
followed this last approach for two reasons: 

e It reduces the overhead on application 
performance (no inserted code) and reduces 
also the cost of the capture/restoration service 
(its implementation is mainly native). 

e Since the thread state capture/restoration 
service has many applications, we believe that 
it is a basic functionality which must be 
integrated within the JVM. 


Aglets 


This solution has been used in the 
implementation of the Sumatra mobile agent 
platform [Ranganathan97]. Unlike Sumatra which 
supplies a mobility service, our implementation 
provides a generic service intended for other uses 
than mobility, like persistence [Bouchena99]. The 
recently proposed Merpati system also follows this 
approach [Suezawa00]. It makes the whole JVM 
mobile or persistent, with all its threads, while our 
services are fine-grained and can be applied to one 
thread. 

To summarize, our services provide a transparent 
and fine-grained Java thread state capture/restoration 
facility. They can be used for several purposes among 
which thread mobility and thread persistence. They are 
integrated into the JVM and thus present competitive 
performance figures. A comparison between the 
performance of the first implicit approach (Wasp’s 
mobility service) and the second implicit approach (our 
mobility service) can be found in [Bouchenak00]. 


7. Conclusionand future work 


Since the Java virtual machine does not provide 
any access to the state of Java threads, we designed and 
implemented a new service for the capture and 
restoration of thread state. Our capture/restoration 
service is generic: we used it as a basis for the 
implementation of thread mobility and thread 
persistence services. With these services, a running 
Java thread can, at an arbitrary state of its execution, 
migrate to a remote machine or be checkpointed on disk 
and then recovered. In addition, the migration or the 
checkpointing of a thread can be initiated by the thread 
itself or by another thread. 
Our services were integrated into the JVM, so 
they provide acceptable performance figures without 
inducing overhead on JVM performance. Finally, we 
experimented with a prototype implementation a 
dynamic reconfiguration tool based on our mobility 
service and applied to a running distributed application. 
The lessons learned from this experiment are that: 
e It is possible to extend the Java virtual machine 
with thread mobility and persistence services 
without re-designing the whole JVM. 

e This implementation provides reasonable and 
competitive performance costs. 

At the present time, we are considering the usage 
of our services in real world applications such as 
dynamic load balancing in distributed systems and the 
integration of our services into distributed Java virtual 
machines. We also plan to port our services to the K 
Virtual machine, the lightweight JVM, in order to make 
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them available on small devices such as phones and 


PDA [Sun00b]. 
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Abstract 


Although the benefits of software component com- 
position are today widely accepted, component ori- 
ented software development is not yet as widespread 
as its multiple advantages may suggest. This is so 
in spite of the maturity reached by several compo- 
nent models (Microsoft’s COM, JavaBeans, OMG’s 
CORBA), and their general acceptance by large 
communities of developers. Thus, while compo- 
nents are being ’used’ in software development, the 
development process itself is not fully component 
oriented. One major roadblock limiting the adop- 
tion of a component oriented development process 
is the lack of viable component composition lan- 
guages. This paper introduces a component com- 
position language specifically designed for the com- 
position of JavaBeans components. 


The Bean Markup Language (BML) supports 
component composition in a first-class manner. 
BML has language constructs for describing inter— 
component bindings, for constructing aggregates of 
components, for macro expansion and for imple 
menting certain types of recursive compositions. 
Further, it allows the specification of “glue code” 
in any traditional scripting language (for example, 
JavaScript) to enable powerful adaptation during 
composition. 


“Presently Vice President, Development & Architecture, 
Vast Video Inc., Astoria, NY 11106. 


1 Introduction 


The benefits of software component composition are 
today widcly accepted, see [6, 7, 21, 11, 2, 15], how- 
ever, component oriented software development is 
not yet as widespread as its multiple advantages 
may suggest. This is so in spite of the maturity 
reached by several component models (Microsoft’s 
COM, JavaBeans, OMG’s CORBA), and their gen- 
eral acceptance by large communities of developers. 
Thus, while components are being “used” in soft- 
ware development, the development process itself is 
not fully component oriented. One major roadblock 
limiting the adoption of a component oriented devel- 
opment process is the lack of viable component com- 
position languages. As has been argued in [13] and 
[21], component-oriented development is likely to be 
much more successful when first-class mechanisms 
enabling simple forms of composition are used. 


Component-oriented development is a natural 
evolution of the object-oriented development 
paradigm. Components provide a programming ab- 
straction in terms of events, properties and meth- 
ods. The properties and methods of a component 
allow the component to be configured and events 
are how the component communicates interesting 
information to its consumers. In the component- 
oriented development model, application develop- 
ment becomes a matter of “scripting together” a set 
of such components, where the components them- 
selves are sometimes bought from a collection of 
third parties and sometimes developed in-house. 
This type of composition enables loose coupling and 
provides the necessary hooks for adapting pre-built 
components as needed to form the desired aggre 
gation. Object-oriented development is clearly the 
predominant methodology used in developing the 
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components themselves [21, 11]. 


The key technologies that enable component- 
oriented development are a component model and 
a composition mechanism. A component model is 
a set of conventions and a run-time architecture 
that provide an environment to define and manipu- 
late software components. The definition of a soft- 
ware component varies. However, a common theme 
is that it is an executable, self-contained, dynam- 
ically loaded/bound module that exhibits certain 
types or interfaces or contracts to other components 
that adhere to the same component model [7]. The 
three most popular component models in use to- 
day are: Microsoft’s COM [5], OMG’s CORBA [18] 
and JavaSoft’s JavaBeans [20]. The work described 
in this paper assumes the JavaBeans component 
model, but it could be implemented with any of 
these models. 


Component composition is the key programming 
task required and enabled by component-oriented 
development [2]. Component composition is the 
process of creating component instances, configur- 
ing them and putting them together to form com- 
posite components or applications. Configuring 
components consists of manipulating their proper- 
ties and also invoking their methods. Putting com- 
ponents together consists of describing component-— 
to-component relationships as well as aggregations 
such as component hierarchies. An ideal language 
for component composition would have first-class 
syntax and semantics to support such composition 
operations. 


Component composition can be performed in a vari- 
ety of ways. The obvious way is to use a traditional 
programming language to write code that creates in- 
stances of components and composes them together 
by using the appropriate method calls. This task 
is commonly done by the “main” procedure of an 
application or that of a composite component. 


Traditional programming languages are however not 
the best suited for component composition. Since 
their syntaxes and semantics do not support com- 
ponent composition concepts in a first-class manner, 
composition operations are supported using other 
existing language elements like, say, method calls. 
As a result, the composition operations are lost 
amongst the rest of the code and the compositional 
structure is obscured. A discussion of the short- 
comings of object oriented languages when applied 
to component composition can be found in [1]. 


A second common approach to software composi- 
tion is the use of scripting languages. Scripting 
languages are programming languages which are 
supposed to be in some sense “easier” to program 
with: they are typically loosely typed and inter- 
preted. They are commonly used for application 
prototyping, configuration, customization and ex- 
tension [17, 11]. Scripting is a natural counter- 
part to component-oriented development - compo- 
nents can be written in standard object-oriented 
languages and then “glued together” to form ap- 
plications. However, as a mechanism for com- 
ponent composition, scripting languages such as 
PERL [22], Tcl [16] and JavaScript [8] suffer from 
the same problem as traditional programming lan- 
guages, their lack of first class support for com- 
position. In fact, scripting languages do not add 
abstractions to programming; their primary goals 
are to reduce complexity by eliminating syntax and 
types to make programming “easier,” not to change 
the level of abstraction. 


Visual composition is another popular approach to 
component composition. A visual builder allows one 
to select components from a palette, place them on 
a composition editor and visually “wire” together 
the component interactions. Where required, ad- 
ditional behavior can be added using scripts, for 
instance, to intercept an event on its path from a 
source to a target and trigger special actions. The 
JavaBeans component model in fact recognizes the 
role of visual builders and provides for the com- 
ponent to distinguish between build-time and run- 
time. Using this, a JavaBeans component may, for 
example, present a build-time user interface that 
can be used to configure the component. Visual 
composition clearly provides first-class support for 
the composition operations described earlier. Vi- 
sual composition’s power is also its failing however: 
it is interactive and graphical, and, consequently, 
not an option in scenarios where the composition is 
done by a non-interactive mechanism. This is the 
case, for instance, when user interfaces are automat- 
ically generated from data input specifications. A 
first class representation of the composition would 
allow both generation methods (interactive and non- 
interactive) to interoperate, by acting as a neutral 
intermediate format. Moreover, if appropriately de- 
signed, the intermediate format could also be di- 
rectly manipulated by developers. 


Hence, a solution to the shortcomings of existing 
compositions techniques is to introduce a compo- 
nent composition language, that is, a language in 
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which the basic component composition operations 
is supported in a first class manner. 


Extending the syntax and semantics of existing lan- 
guages looks like an attractive option, but it can be 
argued that a special purpose composition language 
will do a better job capturing the specific nature of 
composition operations. Moreover, it is noted in [1] 
that object-oriented languages like Java and object 
oriented design tend to be used to produce domain 
specific designs, rather than standard architectures 
more suitable for the kind of reuse expected from 
software components. 


The most relevant effort in the definition of a com- 
position language is probably Piccola [1], a declar- 
ative composition language founded on a variant of 
the z calculus, in which components are viewed as 
interacting processes. Piccola is a very small lan- 
guage, which is able to support a variety of compo- 
nent models through the definition of different com- 
ponent composition (“architectural”) styles. Pic- 
cola is an on-going research effort. 


The introduction of a new language has some major 
practical problems, though. It requires retraining 
and the development or adaptation of tools to sup- 
port it. Furthermore, component models and run- 
time models to interact with other languages must 
be developed. Thus, an approach where a new lan- 
guage, a new component model and a new run-time 
is needed is not immediately suitable as a mecha- 
nism to enable component-oriented development in 
practice. 


Hence, the problem we are interested in can be for- 
mulated as follows: 


How to design a component composition language 
which can be seamlessly integrated in today’s soft- 
ware development environment. 


We believe that an answer to this problem can be 
a key to successfully driving today’s development 
methodology toward the component oriented de- 
velopment paradigm. This paper describes an an- 
swer to this problem, the Bean Markup Language 
(BML), a declarative language for the composition 
of JavaBeans components. 


The rest of the paper is organized as follows. Section 
2 states the requirements for a composition language 
that can capture the design problem stated above. 
Section 3 describes the design of the BML language 


and how BML addresses these requirements. Sec- 
tion 4 describes the BML implementation and run- 
time support. Finally, in Section 5 we address open 
problems and research issues. 


2 Requirements for a Composition 
Language 


The purpose of this section is to map the design 
problem stated in the Introduction to a list of design 
requirements. The starting problem has two parts: 
designing a composition language, and assuring its 
easy integration in development environments. 


Several papers have dealt with the problem of spec- 
ifying requirements for successful composition lan- 
guages. The following discussion owes much to the 
ones found in [13], [14] and [3]. 


Our first requirement states a list of composition 
operations that a composition language should sup- 
port. 


1. The following composition operations must 
be supported by the language: 


e Binding communication channels. 
Communication channels let components 
exchange data and invoke behavior. Good 
examples are pipes and filters, and event 
notification in JavaBeans. 


e Creating higher level component ag- 
gregates. In this operation components 
are combined to produce higher order 
functional constructs. The combination 
typically involves creating a hierarchy of 
components, as when creating graphical 
user interfaces. 


e Macro expansion of parametrized com- 
ponents. Macro expansion can be used 
in several ways to compose components. 
In the COMPOST language, [3], source 
components are connected together by ex- 
panding (binding) “generalized program 
elements” present in each component’s 
code. Another case of composition by 
macro expansion is described in [4] and 
[19] discusses a mechanism to main- 
tain correct scoping while generating pro- 
grams. 
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e Recursive component composition. 
Component composition is used to create 
new components, rather than an applica- 
tion. This is a powerful technique that en- 
ables components to become software ab- 
stractions at different levels, and provides 
support for top-down progressive refine- 
ment design strategies. It is also has an 
important role providing scalability to the 
language, since it allows using the same 
language composition abstractions at dif- 
ferent configuration levels. 


A language solely devoted to component composi- 
tion must also provide effective separation of con- 
cerns between the person doing the composition and 
the developer of components. This is nothing but 
a restatement of the principle that the composi- 
tion of components must require no knowledge of 
their implementations. In particular, the language 
must provide a way to address “compositional mis- 
matches”, i.e. situations when the interfaces of two 
components are incompatible and don’t allow direct 
composition. 


2. The language should allow the specification of 
“slue code” to deal with compositional mis- 
matches. 


Glue code provides the bridge through which the 
two interfaces can interact. In object oriented de- 
sign this correspond to the “adapter” pattern, [9]. 


The next requirement deals with the important issue 
of reusing component application designs. 


3. The language should support component 
frameworks. 


Here the notion of a component framework is sim- 
ilar to the frameworks found in object oriented de- 
sign, see [9, 10] for instance. It is defined in {21] 
as a software architecture that provides basic rela- 
tionships among components and allows instances of 
those components to be plugged in the framework. 
Frameworks are important tools that provide com- 
ponent assemblers with the infrastructure needed to 
build structured applications. Frameworks are also 
important as a knowledge sharing mechanism and 
as enablers of large scale component oriented devel- 
opment. 
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In order to assure seamless integration of the lan- 
guage into current development environments, we 
state in our requirements list the need for low adop- 
tion costs, and the ability to reach different devel- 
opment platforms as possible: 


4. Reduce to a minimum the learning process for 
the language. In particular, use whenever pos- 
sible existing languages, syntactic and semantic 
conventions. The Java language and the XML 
syntax would be good starting points according 
to this criterion. 


5. Eliminate the need for new support tools, 
whenever possible. Existing development en- 
vironments should be able to provide support 
for the language with minimal investment. 


6. The language primitives must allow easy exten- 
sion to support alternative component models. 
While the focus of this work has been the Jav- 
aBeans model, it should enable a direct exten- 
sion to support the COM and CORBA models. 


3 BML: A Composition Language for 
JavaBeans 


The BML language was designed to meet many of 
the requirements we have identified in the previous 
section. This section describes the BML language, 
its design principles, and some of its most relevant 
features. 


This section is organized as follows: first we explain 
some of the general design decisions behind BML. 
Finally, we describe the major language elements 
and explain the support that BML provides for the 
composition of components and other relevant fea- 
tures of the language. 


Design Principles 


BML has been designed as an XML-based declara- 
tive language for describing the composition of Jav- 
aBeans applications. This statement summarizes 
three major design principles, which we review in 
this section. 


XML syntax. BML intentionally de-emphasizes 
the importance of syntax. From the two alterna- 
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tives of choosing a syntax with multiple elements 
and structures (e.g., a Java-like syntax), or follow- 
ing a relatively “syntax-free” approach (e.g., the 
Lisp way), the second option was judged more likely 
to allow the language to satisfy requirements 4 and 
5 from Section 2. This is the reason why XML was 
chosen as the syntactic model for the language. Its 
XML syntax is in fact the main reason why BML 
complies with those t wo requirements. 


XML languages follow a relatively simple syntax 
model (see {12]), and are described using the XML 
DTD [24] or the XML Schema [25] metalanguages. 
The XML model allows very limited syntactic op- 
tions, essentially the choice of whether to use an 
XML attribute or an XML element to represent fea- 
tures of the language. XML, on the other hand, is 
already a widely embraced industry standard, its 
simple syntax is well known by many developers, 
and supporting middleware is available for all ma- 
jor computing platforms. 


While a Java—like syntax would have the advantage 
of providing a certain degree of familiarity to Java 
developers, it would also have the disadvantage to 
being only Java-lke, and not exactly Java. In fact, 
the intended user of BML is the component com- 
poser, who may not even be a Java developer. 


A declarative language. BML is designed to de- 
scribe the composition of a set of components rather 
than to describe how the composition is to be im- 
plemented. To fully understand this distinction we 
state here the four phases of the component devel- 
opment process: 


1. Authoring-time. Components are created, 
typically using an object oriented language, and 
packaged for use by third parties. 


2. Composition-time. This is design-time, 
when components are selected, configured and 
the desired composition is described. The role 
of component languages is to capture this com- 
position. 


3. Assembly-time. Part of the application 
startup time. The composition described in the 
component language script is realized into an 
executable application, typically by a compo- 
nent language processor. 


4. Run-time After the composition is performed, 
the application runs to perform its function. 
Non-compositional processing happens at this 


time, typically by executing the component’s 
own code. 


The role of BML is to represent the structure of 
a composed application as designed at composition 
time. The actual assembly of the components is 
the role of the language processor at assembly time. 
BML defines an assembly-time environment to sup- 
port this distinction (the assembly—time environ- 
ment is described later in this section and in Sec- 
tion 4). This is the reason why we describe BML 
a as declarative composition language. It must be 
remarked that, while BML allows the inclusion of 
sections of “glue code” for the purpose of solving 
compositional mismatches and “configuration in- 
structions” for configuring individual components 
for composition, the BML language’s compositional 
elements are declarative. 


Application versus component composition. 
Compositions can be either “final” or reusable. Fi- 
nal compositions are applications. Reusable compo- 
sitions are themselves components and can be used 
in new compositions, both final and reusable. The 
main difference between the two is that reusable 
compositions present a well defined public interface 
that identifies them as components in the compo- 
nent model under consideration, and allows reuse. 
BML is designed to enable the creation both types 
of compositions, by providing component definition 
language elements in addition to the basic compo- 
sitional and configurational elements. The ability 
to define reusable components in the language pro- 
vides support for the for recursive composition re- 
quirement listed in Section 2. 


3.1 BML Language 


As an XML based language, BML uses different 
XML elements for each composition operation. This 
section presents the BML solution to the compo- 
sition language problem by describing how it ad- 
dresses each of the requirements listed earlier. We 
describe only the essential features of each element 
in this document; complete documentation can be 
found in the BML User’s Guide, which is part of 
the BML distribution [23]. The syntax of BML is 
summarized in a BNF-like form in Table 1. 


The role of BML is to capture a composition. A 
composition script is represented in BML by a 
<script> element. The contents of this element are 
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<script> 
<bean> 
<args> 
<property> 
<field> 
<event-binding> 
<call-method> 
<cast> 
<string> text 
<add> % V+ 
C <3 (<bean> | <string> | 
<property> | <field> | 
<call-method> | 
<script>) 
(C | <event-binding> | 
<add>) 
(C | <cast>) 


(S | <cast>)+ 
<args>? Sx 

V+ 

Vv? 

Vv? 

<bean> | <script> 
Ve 

Vv? 





Table 1: BML Syntax Summary 


arbitrary BML elements and the result of evaluating 
it is the value of evaluating the last child element. 


We start the description of the BML language with 
a small, yet complete example. 


The Juggler 


This section provides an simple example of a BML 
application. The purpose is to give the reader an 
early view of a significant subset of BML. 


The example shows how BML can be used to com- 
pose a collection of AWT components into an appli- 
cation. The application includes an animation com- 
ponent and two buttons that control it, as well as 
a window frame component that acts as a container 
component. Figure 1 shows the resulting applica- 
tion. The example code is shown in Figure 2. 


We now briefly explain how the code in in Figure 2 
works. 


Note that line 0 is the XML declaration which 
is required of XML documents. In line 2 a new 
a new script of BML statements is started with 
the <script> element. The <bean> element in 
line 3 creates a component of type java.awt.Frame 
and uses the id attribute to assign to it the name 
“frame”. In line 4 the title property of the frame 
bean is set. The <event-binding> element in line 








Figure 1: The Juggler Application 


5 binds the script in lines 6-10 so it is run when 
a “window” event occurs and the event is delivered 
via the windowClosing method. The script contains 
one statement (lines 7-9) which causes the program 
to exit. 


On line 14 the animator component is created and 
given the name “Juggler”. Lines 13-16 aggregate 
this component into the container “frame” at the 
center position using the <add> element. On line 
18 a button component is created, its label prop- 
erty is set to “Start” on line 19. Line 20 binds the 
script on lines 21-23 to be run when an “action” 
event occurs on the button. The script invokes the 
start method of the “Juggler” component using the 
<call-method> element. Observe that the target 
component is identified using the “target” attribute. 
Lines 17-27 aggregate the button component into 
the container “frame” in the north position. Simi- 
larly, lines 28-38 create another button component 
(this one for stopping the animation) and aggregates 
it to the “frame” component. Lines 40-41 invoke the 
“pack” and “show” methods of the frame in order 
to bring it to the screen. Finally, line 43 invokes 
the “start” method of the animator component to 
initiate the animation. 


In the following sections we review in detail some 
important aspects of the language. 


Naming and Scoping in BML 


A mechanism to identify components is fundamental 
to any composition language. In the previous sec- 
tion we have seen how beans can be created with the 
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<?xml version="1.0"7> 


<script> 
<bean class="java.awt.Frame" id="frame"> 
<property name="title" value="IBM Juggler"/> 
<event-binding name="window" filter="windowClosing"> 
<script> 


<call-method target="class:java.lang.System" name= 


<cast class="int" value="0"/> 
</call-method> 
</script> 
</event-binding> 


<add> 
<bean class="demos.juggler.Juggler" id="Juggler"/> 
<string value="Center"/> 
</add> 
<add> 
<bean class="java.awt .Button"> 
<property name="label" value="Start"/> 
<event-binding name="action"> 


<script> 
<call-method target="Juggler" name="start"/> 
</script> 
</event-binding> 
</bean> 
<string value="North"/> 
</add> 


<add> 
<bean class="java.awt.Button"> 
<property name="label" value="Stop"/> 
<event-binding name="action"> 


<script> 
<call-method target="Juggler" name="stop"/> 
</script> 
</event-binding> 
</bean> 
<string value="South"/> 
</add> 


<call-method name="pack"/> 
<call-method name="show"/> 
</bean> 
<call-method target="Juggler" name="start"/> 
</script> 


Figure 2: The Juggler Script 
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<bean> element, named using the zd attribute, and 
located with the target attribute. Assigning names 
to new components is optional, but if a name is as- 
signed then the component is registered in current 
scope with that name. 


BML is a lexically scoped language. A scope 
is defined by the <script> and the <bean> ele- 
ments. The default scope is created by the open- 
ing <script> element and nested scopes are explic- 
itly created by nesting <script> elements; nested 
<bean> elements implicitly create a new scope. 
The scoping semantics are as usual with inside- 
out visibility and standard shadowing rules. The 
scope is represented at assembly-time as an “ob- 
ject registry”, a registry which provides a name-to- 
reference mapping and is part of the BML language 
processor’s environment during assembly-time. 


A related issue is how BML uses XML’s contain- 
ment model to effect a Pascal-—style “with” operator. 
Recall from our previous example that the target at- 
tribute is used to name the bean on which an oper- 
ation is to be performed. In BML the default target 
for a composition operation is the closest enclosing 
component, as in the next example. 


<bean class=’ java.awt.Button’> 
<property name=’ label’ 
value=’Click Me’/> 
</bean> 


Configuration of Components and Type Con- 
version 


JavaBeans components may have configurable prop- 
erties. BML uses the <property> element for this 
purpose. The example in Figure 1 shows that the 
value of the property can be encoded using the value 
attribute, as long as this value can be represented 
as a String. 


When there is no possible string representation, 
the <property> element is given one child element, 
the result of evaluating which becomes the value 
assigned to the property. The following example 
shows how one can change the layout Manager prop- 
erty of a Panel component: 


<bean class=’ java.awt.Panel’> 
<property name=’ layoutManager’> 


<bean 
class=’ java.awt.BorderLayout’/> 
</property> 
</bean> 


The <property> element also supports retrieving 
property values. This is effected by omitting any 
value for the property; thus, if a value to assign the 
property is not found, it is treated as an “rvalue” 
instead of as an “Ivalue.” 


We now discuss type conversions issues arising when 
property values are encoded as strings. We must 
note first that this issue is a consequence of the lack 
of typing information in XML, which results in all 
the data being encoded as strings. 


If the type of the property being set is not a string, a 
type conversion must be performed before the value 
can be set. For instance, one may wish to set color- 
valued properties by giving a string containing the 
RGB representation of the color. BML’s approach is 
to separate the type conversion problems from the 
compositional problems as much as possible. All 
the type conversion logic is considered part of the 
assembly-time environment of the language, and is 
not reflected in the BML script. 


The BML processor’s assembly-time environment 
contains a registry, called the type converter reg- 
istry, which is a collection of code that is able to 
convert data from one type to another. If a type 
conversion is deemed necessary, the processor will 
transparently invoke the appropriate converter in 
order to effect the setting of the property. The type 
converter registry mechanism serves to improve the 
declarative nature of BML: it enables one to concen- 
trate on the required composition operations and 
defer the issues of how to realize them until later 
(and probably to someone else). 


Type conversions can also be explicitly requested 
in BML. The <cast> element is a utility element 
used to explicitly request a type cast, or to explic- 
itly invoke a type converter to change the type of a 
value. An example of is provided in line 8 of Figure 
1, where a string to integer conversion is requested. 
Explicit casts are commonly found in BML in the 
the arguments of a <call-method> invocation (lines 
7 to 9 in Figure 1), a common way of performing 
more complex configuration of components. 
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Binding Events 


Binding communication channels is one of the main 
composition operations described in Section 2. In 
the JavaBeans model inter-component composition 
channels are event streams. In order to do the bind- 
ing, two requirements must be met: 


e The event source must be notified of the listen- 
ers’ interest in receiving the events. 


e Event listeners must be of a suitable type which 
is statically defined by the event source. 


BML uses the <event-binding> element for this 
purpose, as in the next example: 


<bean class=’ java.awt.Button’> 
<event-binding name=’action’> 
<bean class=’MyActionListener’ 
id=’al’/> 
</event-binding> 
</bean> 


Notice that the component ’al’ must be of the ap- 
propriate type for this the binding operation to be 
valid. This kind of binding of communication chan- 
nels is hence fairly restrictive as the components 
must be statically designed to be aware of each 
other’s event types. The following section discusses 
how this is generalized to make event bindings more 
adaptable. 


Writing “Glue” Code 


Composing components that are not pre-designed 
to be linked together often requires the writing of 
“glue” code to solve these compositional mismatches 
(recall requirement 2 from Section 2). In the Jav- 
aBeans case the problem is even worse because the 
event binding architecture which requires that the 
event listener implement a certain interface type. 


BML addresses this requirement by allowing the 
component composer to author glue code in any 
of several traditional scripting languages. The cur- 
rently supported languages include JavaScript, Jacl, 
JPython and VBScript. 


This is an important design point. Traditional lan- 
guages are better fit for writing glue code because, 


typically, the gluecode does not perform component 
composition, but rather some type of data adapta- 
tion to allow components to interact. A composi- 
tion language is clearly less suited for such tasks 
than a traditional scripting language, except per- 
haps for the most elementary ones. Observe also 
that this further reinforces the clearer separation be- 
tween component authoring and component compo- 
sition: while JavaBeans authors are Java program- 
mers, component composers need not be so. 


The glue codeis directly embedded in the composi- 
tion script using a <script> element as the child of 
an <event-binding>. In lines 20 to 24 in Figure 1, 
for instance, a BML script is provided to cause the 
invocation of the “start” method when an “action” 
event is received. 


The code in these scripts is executed at run-time 
when events are generated by the event source com- 
ponent. However, BML provides static scoping for 
the script, that is, any component that is referenced 
by the script and was previously registered within 
its lexical scope will be available during script eval- 
uation. Section 4 describes the implementation of 
<script>. 


Aggregation 


Aggregation of components into hierarchies is an- 
other major composition operation. BML sup- 
ports it through the <add> element. The fol- 
lowing example illustrates the process of adding 
a java.awt.Button component to a java.awt.Panel 
component: 


<bean class=’ java.awt.Panel’> 
<add> 
<bean class=’ java.awt.Button’/> 
</add> 
</bean> 


The meaning of an aggregation operation is defined 
by the “container” into which aggregation is occur- 
ring. Thisis the default target bean (in XML terms, 
the parent <bean> element of the <add>), unless 
otherwise stated by the <add> element. BML’s ap- 
proach is to stay away from differences in the seman- 
tics of the operation; only its compositional signif- 
icance is of interest. Observe that the operation of 
nesting a <bean> element inside another has very 
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different semantics from the aggregation operation, 
since the first one corresponds only to the declara- 
tion of a bean inside the parent’s scope. 


Agegregations defined by different containers may 
require different data to be specified before the 
operation can be performed. In the above ex- 
ample, the <add> element has only one child 
because the layout manager that the panel uses 
(java.awt.FlowLayout) does not require any other 
information. However, the add operations in- 
cluded the example from Figure 1 take two argu- 
ments, since the default layout manager for the 
java.awt.Frame component, the BorderLayout, re- 
quires that we indicate the layout area in which the 
a component is to be added. In general, the first 
child element of <add> is expected to identify what 
to add and any other children are expected to be 
additional information as needed by the container’s 
semantics for aggregation 


The mechanics of how the aggregation is imple- 
mented are part of BML’s assembly-time environ- 
ment. This includes a registry (the adder registry) 
of code fragments (adders) that implement specific 
aggregation operations for specific container types. 
The separation of the compositional meaning of the 
operation from the mechanics of its implementation 
mechanism serves to further increase the declara- 
tive nature of BML: the component composer is 
only concerned with the desired aggregation struc- 
ture and not with how that is to be actually realized. 


Recursive Composition 


Recursive composition of JavaBeans requires a way 
to define a new component in terms of compositions 
of beans. The language elements presented so far 
deal with the connection of already defined beans, 
and are typically contained inside a <script> ele- 
ment. When this element is the root of the XML 
document, the BML script corresponds to a final 
application, that is, cannot be reused as a compo- 
nent (it can be reused through macro expansion as 
we explain later). 


Recursive composition requires additional language 
support to define the constructor, properties, meth- 
ods and events of the new bean using compositions 
of beans. This section describes the creation of new 
JavaBeans with BML. In the next section we present 
the related function of macro expansion in BML, 
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which is way of reusing preconfigured BML scripts. 


The example in Figure 2 shows how the Juggler ap- 
plication of Figure 1 can be wrapped in a bean. In 
this particular case, almost all the code from Figure 
1 has been included as the constructor of the new 
bean, while two method calls have been exposed as 
methods of the composition. 


A new component type is defined in BML using the 
<beanDef> element. The class for the new com- 
ponent is derived from the name attribute. The 
constructor, properties, methods and events of the 
component are defined using the <constructorDef>, 
<propertyDef>, <methodDef>, and <eventDef> 
elements respectively. These definitions can in gen- 
eral be provided in two ways: by delegation or by 
direct implementation. 


When delegation is used, the composite’s property, 
method or event is mapped to a property, method or 
event of a bean which is part of the composition. For 
example, in lines 59 to 66 of Figure 3 methods, prop- 
erties and events of the composite bean are define 
by delegating to the frame Juggler, start and stop 
beans. In all these cases, the name attribute gives 
the name of the new method, property or event in 
the composite, the sourceBean attribute is used to 
identify the delegation bean, and method, property 
and event identify the method, property and event 
in the source bean. 


When using direct implementation, the implemen- 
tation is specified by a nested <script> element con- 
taining a regular BML script (which includes no 
bean definition elements). An example of this is 
provided by the constructor in Figure 3, which in- 
cludes most of the code from Figure 2. Constructors 
are defined using a <constructorDef> element, and 
can only be defined by direct implementation, never 
by delegation. 


Constructors are different from other bean elements 
in another important aspect. The naming scope 
defined by the (top level) <script> element in a 
constructor’s definition is considered global for the 
complete bean definition. That is, the identifiers in- 
troduced in this script are visible everywhere in the 
bean. This is used, for instance, in the identification 
of the beans used in all definitions by delegation. In 
the examples from figure 2, the names specified by 
all the sourceBean attributes correspond to beans 
that were registered in the constructor’s script. The 
identifiers of beans defined in the implementation of 
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<?xml version="1.0"7> 


<beanDef name="CompositeJuggler"> 
<constructorDef> 
<script language="bm1"> 
<bean class="java.awt.Frame" id="frame"> 
<property name="title" value="IBM Juggler"/> 
<event-binding name="window" filter="windowClosing"> 
<script> 
<call-method target="class:java.lang.System" name="exit"> 
<cast class="int" value="0"/> 
</call-method> 
</script> 
</event-binding> 
<add> 
<bean class="demos. juggler. Juggler" id="Juggler"/> 
<string value="Center"/> 
</add> 
<add> 
<bean class="java.awt.Button" id="start"> 
<property name="label" value="Start"/> 
<event-binding name="action"> 
<script> 
<call-method target="Juggler" name="start"/> 
</script> 
</event-binding> 
</bean> 
<string value="North"/> 
</add> 
<add> 
<bean class="java.awt.Button" id="stop"> 
<property name="label" value="Stop"/> 
<event-binding name="action"> 
<script> 
<call-method target="Juggler" name="stop"/> 
</script> 
</event-binding> 
</bean> 
<string value="South"/> 
</add> 
<call-method name="pack"/> 
</bean> 
<call-method target="Juggler" name="start"/> 
</script> 
</constructorDef> 


<methodDef name="show" sourceBean="frame" method="show"/> 
<methodDef name="start" sourceBean="Juggler" method="start"/> 
<methodDef name="stop" sourceBean="Juggler" method="stop"/> 


<propertyDef name="startLabel" sourceBean="start" property="label"/> 
<propertyDef name="stopLabel" sourceBean="stop" property="label"/> 


<eventDef name="window" sourceBean="frame" event="window"/> 


</beanDef> 





Figure 3: The Juggler Bean 
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methods, properties of events are not visible outside 
their own script block. 


Macro Expansion 


BML provides a form of macro expansion that al- 
lows reusing existing BML scripts, which can then 
be embedded and further configured on new scripts. 


To achieve this BML allows using the name of a 
BML file as the value of the class name attribute in 
the <bean> element used to instantiate the compo- 
nent. The nested BML file is evaluated recursively 
within anew scope ofits own, and the resulting bean 
is then used as the default target bean for further 
composition operations. 


Consider this example: 


<bean class=’redbutton.bml’> 
<property name=’label’ 
value=’Red Button’ /> 


</bean> 
where redbutton.bml is: 


<bean class=’ java.awt.Button’> 
<property name=’background’ 
value=’ 0xff£0000’ /> 
</bean> 


In this example the first BML script takes the bean 
produced by evaluating redbutton.bml and then sets 
its label property. The file redbutton.bml takes a 
Button component and sets its background color 
property to red and returns it. This simple example 
illustrates how a nested BML script can be used as 
defining a component which is then further config- 
ured and composed. 


This approach amounts to macro expansion without 
parameterization. BML in fact allows parameteri- 
zation of such scripts: the recursive invocation can 
be given arguments similar to how constructor argu- 
ments are given. The nested script can then retrieve 
the arguments and use them as it wishes. This al- 
lows the nested script to effectively be a template 
composition, with key parts filled in by the values 
of the parameters. 
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This type of parameterized macro expansion is not 
true recursive composition because we can only ma- 
nipulate the features of the returned component and 
not of an entire composition. See the previous sec- 
tion for a description of how recursive composition 
is supported in BML. 


4 Implementing BML 


We have implemented two BML language proces- 
sors: an interpreter and acompiler. Both implemen- 
tations are designed to be embeddable and provide 
full access to the assembly-time environment as well 
as to the run-time environment. The assembly-time 
environment has been pre-populated with a collec- 
tion of type converters and adders that provide com- 
monly used type conversions and aggregation capa- 
bilities, respectively. These can be augmented by 
the host of the BML processor by accessing the en- 
vironment and adding new capabilities to the reg- 
istry. 


The interpreter receives the BML document as an 
XML tree and functions based on whether the out- 
ermost element is a beanDef element or a bean/< 
script element. In the latter case, it uses Java re- 
flection to implement the composition operations. 
In the case of beanDef, the interpreter will use the 
compiler to compile the bean definition upon first 
use and then reuse the resulting class. 


The BML interpreter implements static scoping by 
performing name registration and resolution against 
the object registry in scope. Each < script > el- 
ement introduces a new scope by creating a new 
registry which cascades upwards to the scope that 
embeds it. For event handler scripts (“glue” code), 
which are scripts whose execution is deferred until 
run-time, static scoping is achieved by storing the 
statically scoped registry with the script to be run 
at run-time. For example, the event script in line 22 
of the example shown in Figure 2 binds statically to 
the registry in scope at assembly-time and then at 
run-time uses it to locate the “Juggler” component. 


The compiler receives the BML script as an XML 
tree and uses Java reflection to generate the appro- 
priate Java source code to implement the compo- 
sition operations. If the outermost element is not 
beanDef, the compiler places the resulting code in 
a main() method. Otherwise, the bean definition 


USENIX Association 


USENIX Association 


guides the target code generated. 


The previously identified “assembly-time” phase for 
such generated composition code hence occurs at 
the startup of the execution of the generated code. 
The compiler allows one to generate code that is 
independent of the BML environment at assembly— 
time. That is, it can resolve type converters and 
adders as well as scoping at compile time if possible 
so that the generated code is straight Java code. If 
one wishes to have full embeddability of the gen- 
erated code with the BML assembly-time environ- 
ment, then it is necessary to generate code which 
BML dependent. 


Implementing Event Bindings to Scripts 


BML supports writing arbitrary scripts to be run as 
“glue” code. This is supported for any type of event 
thrown by any bean and is implemented with event 
adaptors, event processors and the event adaptor 
registry. ‘The model consists of an event specific 
adaptor that receives the event from the source, del- 
egates it to a generic event processor which then 
runs the script. This approach is a decomposition 
of the standard JavaBeans event binding model to 
allow dynamic look up and/or generation of event 
adapters. 


Event adapters must implement a simple interface 
that is capable of receiving a handle to an event 
processor. An event adapter must be implemented 
and available from the event adapter registry for 
each event listener type. When the BML processor 
creates an event adapters and adds it as a listener 
to an event source, it tells the adapter what event 
processor to delegate the event to. Event processors 
are the entry point to the BML runtime and are 
responsible for delivering the event to the intended 
recipient script. 


When an event adapter receives an event from an 
event source, it delegates the event to its event pro- 
cessor. The interpreter uses an event processor that 
actually delivers the event to a script and runs the 
script. The compiler can generate both customized 
event processors that perform this task, and cus- 
tomized event adapters that bypass the event pro- 
cessor mechanism entirely and directly deliver the 
event to the user’s script. 


The event adapter registry provides registration and 


lookup service for event adapters. We have also im- 
plemented the ability to on-demand generate event 
adapters in Java bytecode form. This eliminates the 
need to hand~write event adapters in many cases. 
(It is not possible sometimes because of security con- 
straints of the runtime location; loading dynami- 
cally constructed classes is not always permitted.) 


5 Future Work 


Support Other Component Models 


An important objective of the BML project is to 
achieve wide acceptance in the software develop- 
ment community by supporting other component 
models. 


The challenge is to develop a a common compo- 
nent composition language supporting composition 
operations for the three major component mod- 
els, JavaBeans, COM, and CORBA, in such a way 
that it can work with components from different 
component models in the same application. This 
work clearly depends on the availability of run—time 
bridges to go between the different models. Some of 
those already exist. 


Concurrency Support 


BML has not dealt with the issue of object concur- 
rency. If components can be objects, and there is 
the possibility of concurrent execution, concurrency 
control can become a major issue. This is particu- 
larly important when communication channels are 
event streams, as in the JavaBEans model, since 
event handling is a common cause of race condi- 
tions and deadlocks in multithreaded environments 
(see [20]). 


The question is whether, in these circumstances, the 
composition language should assure correct synchro- 
nization among components. When components 
from third party authors are used (maybe from sev- 
eral of them) and concurrent execution is necessary, 
it may become impossible to predict correct syn- 
chronization of the composition, and a positive an- 
swer would seem appropriate. This is the view ex- 
pressed in [13], which underlies the design of the 
Piccola language. 


6th USENIX Conference on Object-Oriented Technologies and Systems 





185 


The issue for BML is whether it is possible to inte- 
grate concurrency control while still keep the sim- 
plicity and transparency of the language. We have 
no answer to this yet. 


6 Conclusions 


We have presented an alternate approach to compo- 
nent composition in the form of a new composition 
language for the composition of JavaBeans compo- 
nents, the Bean Markup Language (BML). BML is 
a declarative language that uses the XML syntax 
to reduce the adoption barrier of both developers 
and machines. The language constructs are few and 
simple, and are designed to capture in a first-class 
manner the semantics of component composition. 
In spite its simplicity, BML provides support for 
most major composition operations. In particular, 
BML supports recursive composition, which assures 
scalability and enables top-down design methodolo- 
gies. 


BML allows composers to author event filtering 
scripts in arbitrary scripting languages, which opens 
up JavaBeans component composition to non—Java 
programmers. That is an important points that re- 
inforces the notion that is not necessarily a program- 
mer’s job. 


Still, BML does not address some relevant issues. 
One is concurrency control, which can be a critical 
issue in multiprocess environments. Finally, the ob- 
jective of using BML as a vehicle to extend compo- 
nent oriented development equires that other com- 
ponent models be supported, if possible through a 
common composition language. 
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Abstract 


Generic programming is a paradigm whose wide adop- 
tion by the C++ community is quite recent. In this 
scheme most classes and procedures are parameterized, 
leading to the construction of general and efficient soft- 
ware components. In this paper, we show how some de- 
sign patterns from Gamma eft al. can be adapted to this 
paradigm. Although these patterns rely highly on dy- 
namic binding, we show that, by intensive use of para- 
metric polymorphism, the method calls in these patterns 
can be resolved at compile-time. In intensive computa- 
tions, the generic patterns bring a significant speed-up 
compared to their classical peers . 


1 Introduction 


This work has its origin in the development of Olena 
[11], our image processing library. When designing a li- 
brary, one wants to implement algorithms that work on 
a wide variety of types without having to write a pro- 
cedure for each concrete type. In short, one algorithm 
should be generic enough to map to a single procedure . 
In object-oriented programming this is achieved using 
abstract types. Design Patterns, which are design struc- 
tures that have often proved to be useful in scientific 
computing, rely even more on abstract types and inclu- 
sion polymorphism!. 


However, when it comes to numerical computing, 
object-oriented designs can lead to a huge performance 
loss, especially as there may be a high number of virtual 
functions calls [7] required to perform operations over 


"Inclusion polymor phism cotesponds to virtual member functions 
in C+, deferred functions in Eiffel, and primitive functions in Ada. 


an abstraction. Yet, rejecting design patterns for the sake 
of efficiency seems radical. 


In this paper, we show that some design patterns from 
Gamma et al. [10] can be adapted to generic program- 
ming. To this aim, virtual functions calls are avoided by 
replacing inclusion polymorphism by parametric poly- 
morphism. 


This paper presents patterns in C++, but, although they 
won't map directly to other languages because “gener- 
icity” differs from language to language, our work does 
not apply only to C++: our main focus is to devise flex- 
ible designs in contexts where efficiency is critical. In 
addition, C++ being a multi-paradigm programming lan- 
guage [28], the techniques described here can be limited 
to critical parts of the code dedicated to intensive com- 
putation. 


In section 2 we introduce generic programming and 
present its advantages over classical object-oriented pro- 
gramming. Then, section 3 presents and discusses the 
design of the following patterns: GENERIC BRIDGE, 
GENERIC ITERATOR, GENERIC ABSTRACT FACTORY, 
GENERIC TEMPLATE METHOD, GENERIC DECORA- 
TOR, and GENERIC VISITOR. We conclude and con- 
sider the perspectives of our work in section 4. 


2 Generic programming 


By “generic programming” we refer to a use of parame- 
terization which goes beyond simple genericity on data 
types. Generic programming is an abstract and efficient 
way of designing and assembling components [15] and 
interfacing them with algorithms. 
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Generic programming is an attractive paradigm for sci- 
entific numerical components [12] and numerous li- 
braries are available on the Internet [22] for various do- 
mains: containers, graphs, linear algebra, computational 
geometry, differential equations, neural networks, visu- 
alization, image processing, etc. 


The most famous generic library is probably the Stan- 
dard Template Library [26]. In fact, generic program- 
ming appeared with the adoption of STL by the C++ 
standardization committee and was made possible with 
the addition of new generic capabilities to this lan- 
guage [27, 21]. 


Several generic programming idioms have already been 
discovered and many are listed in [30]. Most generic 
libraries use the GENERIC ITERATOR that we describe 
in 3.2. In POOMA [12] — a scientific framework 
for multi-dimensional arrays, fields, particles, and trans- 
forms— the GENERIC ENVELOPE-LETTER pattern ap- 
pears. In the REQUESTED INTERFACE pattern [16], a 
GENERIC BRIDGE is introduced to handle efficiently an 
adaptation layer which mediates between the interfaces 
of the servers and of the clients. 


2.1 Efficiency 


The way abstractions are handled in the object-oriented 
programming paradigm ruins the performances, espe- 
cially when the overhead implied by the abstract inter- 
face used to access the data is significant in comparison 
with the time needed to process the data. 


For example, in an image processing library in which 
algorithms can work on many kinds of aggregates (two 
or three dimensional images, graphs, etc.), a procedure 
that adds a constant to an aggregate may be written using 
the object-oriented programming paradigm as follows. 


template< class T > 
void add {aggregate<T>& input, T value) 
{ 
iterator<T>& iter = input.create_iterator (); 
for (iter.first(); !iter.is_done(); iter.next()) 
iter.current_item () += value; 


Here, aggregate<T> and iterator<T> are abstract 
classes to support the numerous aggregates available: 
parameterization is used to achieve genericity on pixel 
types, and object-oriented abstractions are used to get 
genericity on the image structure. 
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dedicated C | classical C++ 





Table 1: Timing of algorithms written in different 
paradigms. (The code was compiled with gcc 2.95.2 
and timed on an AMD K6-2 380MHz machine running 


GNU/Linux.) 


As a consequence, for each iteration the direct call to 
T::operator+=() is drowned in the virtual calls to 
current_item{),next{) and is_done(), leading to 
poor performances. 


Table 1 compares classical object-oriented programming 
and generic programming and shows a speed-up factor 
of 3 to 4. The add test consists in the addition of a 
constant value to each element of an aggregate. The 
mean test replaces each element of an aggregate by the 
mean of its four neighbors. The durations correspond 
to 200 calls to these tests on a two dimensional image 
of 1024 x 1024 integers. “Dedicated C” corresponds to 
handwritten C specifically tuned for 2D images of inte- 
gers, so the difference with classical C++ is what people 
call the abstraction penalty. While this is not a text- 
book case —we do have such algorithms in Olena— it 
is true that usually the impactof object-oriented abstrac- 
tion is insignificant. High speed-ups are obtained from 
generic programming compared to object-oriented pro- 
gramming when data processing is cheap relatively to 
data access. For example for simple list iteration or ma- 
trix multiplication. 


The generic programming writing of this algorithm, us- 
ing a GENERIC ITERATOR, will be given in section 3.2. 


2.2 Generic programming from the language 
point of view 


Generic programming relies on the use of several pro- 
gramming language features, some of which being de- 
scribed below. 


Genericity is the main way of generalizing object- 
oriented code. Not all languages support both 
generic classes and generic procedures (e.g., Eiffel 
features only generic classes). 


Nested type names refers to the ability to look up a 
type as member of a class and allow to link re- 
lated types (such as image2d and iterator2d) 
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together. 


Constrained genericity is a way to restrict the possible 
values of formal parameters using signatures €.g., 
when using ML functors [19]) or constraining a 
type to be a subclass of another (as in Eiffel or Ada 
95). C++ does not provide specific language fea- 
tures to support constrained genericity, but subclass 
constraints [29, 23] or feature requirements [18, 25] 
can be expressed using other available language fa- 
cilities [27]. 


Generic specialization allows the specialization of an 
algorithm €.g., dedicated to a particular data type) 
overriding the generic implementation. 


Not all languages support these features, this explains 
why the patterns we present in C++ won’t apply directly 
to other languages. 


2.3 Generic programming guidelines 


From our experience in building Olena, which is entirely 
carried out by generic programming, we derived the fol- 
lowing guidelines. These rules may seem drastic, but 
their appliance can be limited to critical parts of the code 
dedicated to intensive computation. 


Guidelines for generic classes: 


e Avoid inclusion polymorphism. 
In other words, the type of a variable (static type, 
known at compile-time) is exactly that of the in- 
stance it holds (dynamic type, known at run-time). 
The main requirement of generic programming is 
that the concrete type of every object is known at 
compile-time. 


Avoid operation polymorphism. 

Abstract methods are forbidden: dynamic binding 
is too expensive. Simulate operation polymorphism 
with either: (i) parametric classes thanks to the Cu- 
riously Recurring Template idiom (see section 3.4), 
or (ii) parametric methods, which lead to a form of 
ad-hoc polymorphism (overloading). 


Use inheritance only to factor methods and to de- 
clare attributes shared by several subclasses. 


Guidelines for procedures which use generic pat- 
terns: 


e Parameterize the procedures by the types of their 
inputs, even if the input itself is parameterized. 


e Parameterize the procedures by the types of the 
components used (unless they can be obtained by 
a nested type lookup in another parameter-type). 


3 Generic Design Patterns 


Our generic design patterns exposition is Gamma et al.'s 
description of the original, abstract version of the pat- 
terns [10]. We do not repeat the elements that can be 
found in this book. 


3.1 Generic Bridge 


Intent 


Decouple an abstraction from its implementation so that 
the two can vary independently. 


Structure 






= imp->operation_imp (); | 





concrete_Impiementor_a 


operation_imp () 


Participants 


An abstraction class is parameterized by the 
Implementation used. Any (low-level) operation on 
the abstraction is delegated to the implementation in- 
stance. 
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Consequences 


Because the implementation is statically bound to the 
abstraction, you can’t switch implementation at run- 
time. This kind of restriction is common to generic pro- 
gramming: configurations must be known at compile- 
time. 


Known Uses 


This pattern is really straightforward and broadly used 
in generic libraries. For example the allocator pa- 
rameter in STL containers is an instance of GENERIC 
BRIDGE. 


The POOMA team [5] use the term engine to name im- 
plementation classes that defines the way matrices are 
stored in memory. This is also a GENERIC BRIDGE. 


The Ada95 rational [14, section 12.6] gives an example 
of GENERIC BRIDGE: a generic empty package (also 
called signature) is used to allow multiple implementa- 
tion of an abstraction (here, a mapping). 


As in the case of the original patterns, the structure of 
this pattern is the same as the GENERIC STRATEGY pat- 
tern. These patterns share the same implementation. 


3.2 Generic Iterator 


Intent 


To provide an efficient way to access the elements of 
an aggregate without exposing its underlying represen- 
tation. 


Motivation 


In numeric computing, data are often aggregates and al- 
gorithms usually need to work on several types of ag- 
gregate. Since there should be only one implementation 
of each algorithm, procedures must accept aggregates of 
various types as input and be able to browse their el- 
ements in some unified way; iterators are thus a very 
common tool. As an extra requirement compared to the 
original pattern, iterations must be efficient. 


Structure 


<<type>> 
aggregale 


typedef value_type 


<<type>> 
iterator 


iterator (aggregate&) 

first() 

next() 

is_done() : bool 

current_item() : aggregate.:value_type& 


typedef iterator_type 
create_iterator(} : iterator_type 








'{ aggregate: lterator_type = iterator } ZS 
' 
' 


1 

\ 

1 ; 
' <<implementation class>> ‘~“|"~ 
1 eoncrete_lterator 
' 

t 

' 

t 





concrete _iterator(concrete_aggregate<T>&) 
een | first) 

= nee gs next() 
<<imptementation ciass>> ~ "|" 
concrete aggregate 








is_done() : boot 
current_itemQ : T& 





typedef value_type : T 
typedef iterator_type : concrete_iterator<T> 
create_iterator() : concrete _itesator<T> 


We use typedef as a non-standard extension of 
UML [24] to represent type aliases in classes. 


Participants 


The term concept was coined by M. H. Austern [1], to 
name a set of requirements on a type in STL. A type 
which satisfies these requirements is a model of this con- 
cept. The notion of concept replaces the classical object- 
oriented notion of abstract class. 


For this pattern, two concepts are defined: aggregate and 
iterator, and two concrete classes model these concepts. 


Consequences 


Since no operation is polymorphic, iterating over an ag- 
gregate is more efficient while still being generic. More- 
over, the compiler can now perform additional optimiza- 
tions such as inlining, loop unrolling and instruction 
scheduling, that virtual function calls hindered. 


Efficiency is a serious advantage. However we lose the 
dynamic behavior of the original pattern. For example 
we cannot iterate over a tree whose cells do not have the 
same type”. 


2A link between an abstract aggregate and the corresponding 
generic procedures can be achieved using lazy compilation and dy- 
namic loading of generic code [8]. 
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Implementation 


Although a concept is denoted in UML by the stereotype 
<<type>>, in C++ it does not lead to a type: a con- 
cept only exists in the documentation. Indeed the fact 
that concepts have no mapping in the C++ syntax makes 
early detection of programming errors difficult. Sev- 
eral tricks have been proposed to address this issue by 
explicitly checking that the arguments of an algorithm 
are models of the expected concepts [18, 25]. In Ada 
95, concept requirements (types, functions, procedures) 
can be captured by the formal parameters of an empty 
generic package (the signature idiom) [9]. 


For the user, a type-parameter (such as Aggregate_ 
Model in the sample code) represents a model of aggre- 
gate and the corresponding model of iterator can then 
be deduced statically. 


Sample Code 


template< class T > 
class buffer 
{ 
public: 
typedef T data_type; 
typedef buffer_iterator<T> iterator_type; 
T7/ ome 
}; 


template< class Aggregate_Model > 
void add(Aggregate_Model& input, 
typename Aggregate_Model::data_type value) 
{ 
typename Aggregate_Model: :iterator_type& 
iter = input.create_iterator (); 


for (iter.first(); '!iter.is_done(); iter.next()) 


iter.current_item () += value; 
} 


Known Uses 


Most generic libraries, such as STL, use the GENERIC 
ITERATOR. 


Variations 


We translated the Gamma et al. version, with methods 
first(),is_done(),and next () in the iterator class. 
STL uses another approach where pointers should also 


be models of iterators: as a consequence, iterators can- 
not have methods and most of their operators will rely 
on methods of the container’s class. This makes imple- 
mentation of multiple schemes of iteration difficult: for 
example compare a forward and a backward iteration in 
STL: 


container: :iterator i; 
for (i = c.begin(); i 
EX 2a 


t= c.end(); ++i) 


container: :reverse_iterator i; 
for (i = c.rbegin(); i != c.rend() 
VE se... 


++i) 


First, the syntax differs. From the STL point of view 
this is not a serious issue, because iterators are meant to 
be passed to algorithms as instances. For a wider use, 
however, this prevents parametric selection of the itera- 
tor (i.e., passing the iterator as a type). Second, you have 
to implement as many xbegin() and xend() methods 
as there are schemes of iteration, leading to a higher cou- 
pling [17] between iterators and containers. 


Another idea consists in the removal of all the itera- 
tor related definitions, such as create_iterator({) or 
iterator_type, from concrete_aggregate<T> in 
order to allow the addition of new iterators without mod- 
ifying the existing aggregate classes [32]. This can be 
achieved using traits classes [20] to associate iteration 
schemes with aggregates: the iterated aggregate instance 
is given as an argument to the iterator constructor. For 
example we would rewrite the add() function as fol- 
lows. 


template< class Aggregate_Model > 
void add(Aggregate_Model& input, 
typename Aggregate_Model::data_type value) 
{ 
typename foxward_iterator< Aggregate_Model >::type 
iter (input); 


'‘iter.is_done(); iter.next()) 
+= value; 


for (iter.first(); 
iter.current_item () 
} 


This eliminates the need to declare iterators into the ag- 
gregate class, and allows further additions of iteration 
schemes by the simple means of creating a new traits 
class (for example backward_iterator<T>). 
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3.3. Generic Abstract Factory 


Intent 


To create families of related or dependent objects. 


Motivation 


Let us go back over the different iteration schemes prob- 
lem discussed previously. We want to define several kind 
of iterators for an aggregate, and as so we are candidates 
for the ABSTRACT FACTORY pattern. The STL example 
can be rewritten as follows to make this pattern explicit: 
iterators are products, built by an aggregate which can 
be seen as a factory. 


factory_a::product_1 i; 


for (i = c.begin(); i != c.end(); ++i) 
LP gous 

factory_a::product_2 i; 

for (i = c.rbegin(); i != c.rend(); ++i) 
TD esis 


Implementing a GENERIC ABSTRACT FACTORY is 
therefore just a matter of defining the product types in 
the classes that should be used as a factory. This is really 
simpler than the original pattern. Yet there is one sig- 
nificant difference in usage: an ABSTRACT FACTORY 
returns an object whereas a GENERIC ABSTRACT FAC- 
TORY returns a type, giving more flexibility €.g. con- 
structors can be overloaded). 


We have shown that if we want to implement multi- 
ple iteration schemes, it is better to use traits classes, 
to define the schemes out of the container. A trait 
class if a GENERIC ABSTRACT FACTORY too (think of 
trait: :type as factory: :product). But one issue 
is that these two techniques are not homogeneous. Say 
we want to add a new iterator to the STL containers: we 
cannot change the container classes, therefore we define 
our new iterator in a traits, butnow we must use a differ- 
ent syntax whether we use one iterator or the other. 


The structure we present here takes care of this: both in- 
ternal and external definitions of products can be made, 
but the user will always use the same syntax. 
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Structure 


product_at 
| product_b1 


product_a2 
product_b2 


typedef product_a_type: product_ai 
typedef product_b_type: prodivct_b1 





poste 


uae 
product_b_tralts ios 


typedef type: F::product_b_type 





| product_a_tralts r T = 


typedef type: F::product_a_ type 


! << specialize>> {concrete_factory__2) 
t 
1 << specialize>> (concrete_factory_2) 


> 
typedef type: product_b2 —— 





product_a_trajts<concrete_factory_2> 


typedef type: | typedeftype:product_a2 | a2 


void foo {I : ‘Factory 4 
{ 


typename product_a_tralts<Factory>::type a; 
diss 


} 





Here, we represent a parametric method by boxing its 
parameter. For instance, Factory is a type-parameter of 
the method Accept. This does cannot conform to UML 
since UML lacks support for parametric methods. 


Participants 


We have two factories, named concrete_factory_1 
and concrete_factory_2 which each defines two 
products: product_a_type and product_b_type. 
The first factory define the products intrusively (in it’s 
own class), whilethe second do it externally (in the prod- 
uct’s traits). 


To unify the utilization, the traits default is to use the 
type that might be defined in the “factory” class. For 
example the type a defined in foo<Factory>, defined 
aS product_a_trait<Factory>::type will equal 
to concrete_factory_1: :product_a_type in the 
case Factory is concrete_factory_1. 


Consequences 


Contrary to the pattern of Gamma, inheritance is no 
longer needed, neither for factories, nor for products. In- 
troducing a new product merely requires adding a new 
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parametrized structure to handle the types aliases (e.g., 
product_c_traits), and to specialize this structure 
when the alias product_c_type is not provided by the 
factory. 


Known Uses 


Many uses of this pattern can be found in STL. For ex- 
ample all the containers whose contents can be browsed 
forwards or backwards? define two products: forward 
and backward iterators. 


The actual type of a list iterator never explicitly appears 
in client code, as for any class name of concrete prod- 
ucts. Rather, the user refers to A: : iterator, and a is 
an STL container used as a concrete factory. 


3.4 Generic Template Method 


Intent 


To define the canvas of an efficient algorithm in a supe- 
rior class, deferring some steps to subclasses. 


Motivation 


In generic programming, we limit inheritance to factor 
methods [section 2.3]; here, we want a superior class 
to define an operation some parts of which (primitive 
operations) are defined only in inferior classes. As usual 
we want calls to the primitive operations, as well as calls 
to the template method, to be resolved at compile-time. 


3vectors, doubly linked lists and dequeues are models of this con- 
cept, named reversible containers 


Structure 







abstract_cfass 








Ar Rims ae ii 
primitive_1(); 


template_method() 
primitive_1() - 


primitive_2() ‘ aS 


s primitive_2(); 
Ue 





static_cast<T&>(‘this).primitive_2_impl(); 


ei Sed ae las me 


 state_castcT&o( is pie. im : 


abstract_class<concrete_class> | 





concrete_class 


primitive_1_impt() 
primitive_2_impl() 


Participants 


In the object-oriented paradigm, the selection of the tar- 
get function in a polymorphic operation can be seen as 
a search for the function, browsing the inheritance tree 
upwards from the dynamic type of the object. In prac- 
tice, this is done at run-time by looking up the target in 
a table of function pointers. 


In generic programming, we want that selection to be 
solved at compile-time. In other words, each caller 
should statically know the dynamic type of the object 
from which it calls methods. In the case of a supe- 
rior class calling a method defined in a child class, the 
knowledge of the dynamic type can be given as a tem- 
plate parameter to the superior class. Therefore, any 
class needing to know its dynamic type will be parame- 
terized by its leaf type. 


The parametric class abstract_class defines two op- 
erations: primitive_1() and primitive_2(). Call- 
ing one of these operations leads to casting the target 
object into its dynamic type. The methods executed are 
the implementations of these operations, primitive_ 
1_impl() and primitive_2_impl(). Because the 
object was cast into its leaf type, these functions are 
searched for in the object hierarchy from the leaf type 
up as desired. 


6th USENIX Conference on Object-Oriented Technologies and Systems 


195 


When the programmer later defines the class 
concrete_class with the primitive operation 
implementations, the method template_method() is 
inherited and a call to this method leads to the execution 
of the proper implementations. 


Consequences 


In generic programming, operation polymorphism can 
be simulated by “parametric polymorphism through in- 
heritance” and then be solved statically. The cost of dy- 
namic binding is avoided; moreover, the compiler is able 
to inline all the code, including the template method it- 
self. Hence, this design is more efficient. 


Implementation 


The methods primitive_1() andprimitive_2() do 
not contain their implementation but a call to an imple- 
mentation; they can be considered as abstract methods. 
Please note that they can also be called by the client 
without knowing that some dispatch is performed. 


This design is made possible by the typing model used 
for C++ template parameters. A C++ compiler has to 
delay its semantic analysis of a template function un- 
til the function is instantiated. The compiler will there- 
foreaccept the call to T: :primitive_l_impl() with- 
out knowing anything about T and will check the pres- 
ence of this method later when the call to the A<T>: : 
primitive_1() is actually performed, if it ever is. In 
Ada [13], on the contrary, such postponed type checking 
does not exist, fora function shall type check even if it is 
not instantiated. This pattern is therefore not applicable 
as is in this language. 


One disadvantage of this pattern over Gamma’s imple- 
mentation is directly related to this: the compiler won't 
check the actual presence of the implementations in the 
subclasses. While a C++ compiler will warn you if you 
do not supply an implementation for an abstract func- 
tion, even if it is not used, that same compiler will be 
quiet if pseudo-virtual operations like primitive_1_ 
impl() are not defined and not used. Special care must 
thus be taken when building libraries not to forget such 
functions since the error won’t come to light until the 
function is actually used. 


We purposely added the suffixes _imp1 to the name of 
primitives to distinguish the implementation functions. 
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One could image that the implementation would use the 
same name as the primitive, but this require some addi- 
tional care as the abstract primitive can call itself recur- 
sively when the implementation is absent.* 


Sample Code 


The following code shows how to define a get_next() 
operation in each iterator of a library of containers. Ob- 
viously, get_next() is a template method made by 
issuing successive calls to the current_item() and 
next () methods of the actual iterator. 


We define this method in a superclass iterator_ 
common parametrized by its subtype, and have all iter- 
ators derive from this class. 


template< class Child, class Value_Type > 
class iterator_common 
{ 
public: 
Value_Type& get_next () ( 
// template method 
Value_Type& v = current_item (); 
next (); 
return v; 
} 
Value_Type& current_item () { 
// call the actual implementation 
static_cast<Child&>(*this) .current_item_impi (); 
} 
void next () { 
// call the actual implementation 
static_cast<Child&> (*this) .next_impl(); 
} 
as 


// sample iterator definition 
template< class Value_Type > 
class buffer_iterator: public 
iterator_common< buffer_iterator< Value_Type >, 
Value_Type > 
{ 
public: 
Value_Type current_item_impl () { ... }; 
void next_impl () { ... }; 
wold Preset () { os F 
void is_d@ne {) { ... }; 
rh ech 


Known Uses 


This pattern relies on an idiom called Curiously Recur- 
ring Template [4] derived from the Barton and Nackman 


4You can ensure at compile-time that two functions (the primitive 


and its implementation) are different by passing their addresses to a 
helper template specialized in the case its two arguments are equal. 
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Trick [2]. In [2] this idiom is used to define a binary op- 
erator (for instance +) in a superior class from the corre- 
sponding unary operator (here +=) defined in an inferior 
class. Further examples are given in [30]. 


3.5 Generic Decorator 


Intent 


To efficiently define additional responsibilities to a set of 
objects or to replace functionalities of a set of objects, 
by means of subclassing. 


Structure 


concrete_component 


aperation(} 


operation() ---- 


added_state 


added_behaviour) 





C::operation(); 
added_behavioun'); 


Ss 
concrete_decoratas_b< concretacomponent > dc; 


dc.operationO; 


We use a special idiom: having a parametric class that 
derives from one of its parameters. This is also known 
as mixin inheritance? [3]. 


Participants 


A class concrete_component which can be dec- 
orated, offers an operation operation(). Two 
parametric decorators, concrete_decorator_a and 
concrete_decorator_b, whose parameter is the dec- 
orated type, override this operation. 


> Mixins areoften used in Ada to simulate multiple inheritance [14]. 
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Consequences 


This pattern has two advantages over Gamma’s. First, 
any method that is not modified by the decorator is auto- 
matically inherited. While Gamma’s version uses com- 
position and must therefore delegate each unmodified 
operation. Second, decoration can be applied to a set 
of classes that are not related via inheritance. Therefore, 
a decorator becomes truly generic. 


On the other hand we lose the capability of dynamically 
adding a decoration to an object. 


Sample Code 


Decorating an iterator of STL is useful when a container 
holds structured data, and one wants to perform opera- 
tions only on a field of these data. In order to access 
this field, the decorator redefines the data access opera- 
tor operator* () of the iterator. 


// A basic red-green-blue struct 
template< class T > 
struct rgb 

typedef T red_type; 

red_type red; 


typedef T green_type; 
green_type green; 


typedef T blue_type; 
blue_type blue; 


// An accessor Class for the red field. 
template< class T > 
class get_red 
public: 
typedef T input_type; 
typedef typename T::red_type output_type; 


static output_type& 

get (input_type& v) { 
return v.red; 

} 


static const output_type& 
get (const input_type& v) { 
return v.red; 
} 
iG 


Note how the rgb<T> structure exposes the type of 
each attribute. This makes cooperation between ob- 
jects easier: here the get_red accessor will look up the 
red_type type member and doesn’t have to know that 
fields of rgb<T> are of type T. gét_red can therefore 
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apply to any type that features red and red_type, it is 
not limited to rgb<T>. 


// A decorator for any iterator 
template< class Decorated, 
template< class > class Access > 
class field_access: public Decorated 
{ 
public: 


typedef typename Decorated: :value_type value_type; 


typedef Access< value_type > accessor; 


Known Uses 


Parameterized inheritance is also called mixin inheri- 
tance and is one way to simulate multiple inheritance 
in Ada 95 [14]. This can also be used as an alternate 
way for providing template methods [6]. 


3.6 Generic Visitor 


typedef typename accessor::output_type output_type; 


field_access () : Decorated () {)} 
field_access (const Decorated& d) : Decorated ({d) 


// Overload operator*, use the given accessor 
// to get the proper field. 
output_type& operator* () { 


returm accessor::get (Decorated::operator* ()); 


} 


const output_type& operator* () const { 


return accessor::get (Decorated::operator* ()); 


} 
3 


field_access is a decorator whose parameters are the 
types of the decorated iterator, and of a helper class 
which specifies the field to be accessed. Actually, this 
second parameter is an example of the GENERIC STRAT- 
EGY pattern [6, 30]. 


int main () 

{ 
typedef std::list< rgb< int > > A; 
A input; 
fi - initialize the input list ... 


// Build decorated iterators. 
field_access< A::iterator, get_red > 
begin = input.begin (), 
end = input.end (); 
// Assign 10 to each red field. 
std::fill (begin, end, 10); 


The std::f£i11() procedure is a standard STL algo- 
rithm which assigns a value to each element of a range 
(specified by two iterators). Since std: : £i11() is here 
given decorated iterators it will only assign red fields to 
10. 


Note that the decorator is independent of the deco- 
rated iterator: it can apply to any STL iterator, not only 
list<T>::iterator. The std::fi11() algorithm 
will use methods of field_access inherited from the 
decorated iterator, such as the assignment, comparison, 
and pre-increment operators. 


Intent 
{} 


To define a new operation for the concrete classes of a 
hierarchy without modifying the hierarchy. 


Motivation 


In the case of the VISITOR pattern, the operation varies 
with the type of the operation target. Since we assume 
to know the exact type as compile-time, a trivial design 
is thus to define this operation as a procedure overloaded 
for each target. Such a design, however, does not have 
the advantages of the translation of the VISITOR pattern 
proposed in the next section. 


Structure 


element<concrete_element_a> 





concrete_eiement_a 
v.visit (static_cast<T&>({‘this)}, 
concrete_visitor_1 






concrete_visitor_2 


visit (e: concrete_element_a &) 
visit (e : concrete_element_b &) 


visit (@ : Element &) 





Participants 


In the original Gamma’s pattern the method accept has 
to be defined in each element. The code of each of 
these accept method can be the same®, only type of the 


6This is not actually the case in Gamma’s book, because the name 
of the visiting method to call is dependent on the element type; how- 
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this pointer changes. Here we use the same trick as the 
GENERIC TEMPLATE METHOD to factor accept in the 
superclass. 


Each visitor defines a method visit, for each ele- 
ment type that it must handle. visit can be either an 
overloaded function (as in concrete_visitor_1) or 
a function template (as n concrete_visitor_2). In 
both case, the overload resolution or function instanti- 
ation is made possible by the exact knowledge of the 
element type. 


One advantage of using a member template (as in 
concrete_visitor_2), over an overloaded function 
(as in concrete_visitor_1) is that the concrete_ 
visitor_2 class does need to be changed when new 
type are added: the visitor can be specialized externally 
should the default be inadequate. 


Consequences 


The code is much closer to the one of Gamma than the 
trivial design presented before, because the visitor is 
here an object with all its advantages (state, life dura- 
tion). 


While accept and visit does not return anything 
in the original pattem, they can be taught to. It the 
GENERIC ITERATOR they can even return a type de- 
pendent on the visitor’s type. As the following example 
shows. 


Sample Code 


Let’s consider an image2d class the pixels of which 
should be addressable using different kind of positions 
(Cartesian or polar coordinates, etc.). For better modu- 
larity, we don’t want the image2d to known all position 
types. Therefore we see positions as visitors, which the 
image accepts. accept returns the pixel value corre- 
sponding to the supplied position. The image will pro- 
vide only one access method, and it is up to the visitor to 
perform necessary conversion (e.g. polar to Cartesian) 
to use this interface. 


A position may also refer to a particular channel in a 
color image. The accept return type is thus dependent 
on the visitor. We will use a traits to handle this. 


ever, using the same name (visit) for all these methods make no 
problem in any language as C++ which support function overloading. 


template< class Visitor, class Visited > 
struct visit_return_trait; 


For each pair (Visitor, Visited) visit_return_ 
trait<Visitor, Visited>: :type is the return type 
of access and visit. 


// factor the definition of accept for all images 
template < class Child > 
class image { 
public: 
template < typename Visitor > 
typename visit_return_trait< Visitor, Child >:: 
type accept (Visitor& v) { 
return v.visit (static_cast< Child& > (*this)); 
) 


// ... likewise for const accept 


template< typename T > 
class image_2d : public image< image_2d< T > > { 


public: 
typedef T pixel_type; 
Sh Sis 
T& get_value (int row, int col){...} 


const T& get_value const (int row, int col){...} 
‘3 


Here is one possible visitor, with it’s corresponding 
visit_return_trait specialization. 


class point_2d { 
public: 
point_2d {int row, int col) { ... } 


template < typename Visited > 
typename Visited: :pixel_type& 
visit (Visited& v) { 
return v.get_value (row, col}; 

} 
MY cee 
int row, col; 

}; 


tamplate< class Visited > 

struct visit_return_trait< point_2d, Visited > { 
typedef typename Visited::pixel_type type; 

}G 


channel_point_2d is another visitor, which must be 
parametered to access a particular layer (as in the deco- 
rator example). 
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template< template< class > class Access > 
class channel_point_2d { 
public: 

channel_point_2d (int row, int col) { ... } 


template < typename Visited > 
typename Access< typename Visited: :pixel_type >:: 
output_type& visit (Visited& v) { 
return Access< typename Visited::pixel_type >:: 
get (v.get_value (row, col)); 
) 
SS eR 
he 


template< template< class > class Access 
class Visited > 
struct visit_return_trait 
< channel_point_2d< Access >, Visited > { 
typedef typename 
Access< typename Visited: :pixel_type >:: 
output_type type; 


Finally, the following hypothetical main shows how the 
return value of accept differ according to the visitor 
used. 


int main () { 
image_2d< rgb< int > > img; 
point_2d p(1, 2); 
channel_point_2d<get_red> q({3, 4); 

int v 

rgb<int> w 


img.accept (p); 
img.accept (q); 


In our library, accept and visit are both named 
operator(] so we can write img(p] or p[img] at 
will. 


4 Conclusion and Perspectives 


Based on object programming, generic programming al- 
lows to build and assemble reusable components [15] 
and proved to be useful where efficiency is required. 


Since generic programming (or more generally Gener- 
ative programming (31, 6]) is becoming more popular 
and because much experience and knowledge have been 
accumulated and assimilated in structuring the object- 
oriented programming, we believe that it is time to ex- 
plore the benefits that the former can derive from well- 
proven designs in the latter. 


We showed how design patterns can be adapted to the 
generic programming context by presenting the generic 
versions of three fundamental patterns from Gamma et 
al. [10}: the GENERIC BRIDGE, GENERIC ITERATOR, 
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the GENERIC ABSTRACT FACTORY, the GENERIC 
TEMPLATE METHOD, the GENERIC DECORATOR, and 
the GENERIC VISITOR. We hope that such work can 
provide some valuable insight, and aid design larger sys- 
tems using generic programming. 
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Availability 


The source of the patterns presented in this paper, as 
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